Use case: Voice call broadcast with prerecorded and text-to-speech (TTS) messages
Back to Articles
Tutorial VoIP SIP API Javascript Telecom Howto

Use case: Voice call broadcast with prerecorded and text-to-speech (TTS) messages

May 1, 2018 5 min
Aivis Olsteins

Aivis Olsteins

In this article we will discuss setting up a voice telephony broadcast which does not require knowledge of telephony protocols or installation of any telephony or Voice over IP equipment.

Let's start with this simple case: you want the system to call user whose SIP address is 123456@192.168.1.1 and play a prerecorded wav message which is stored at the following address: http://server.domain.com/voices/hello.wav.

With our API the request is as simple as this:

POST /voice/call/play

{
  "to": "sip:123456@192.168.1.1:5060",
  "from": "me",
  "playlist":
    [
      {
        "play": "http://server.domain.com/voices/hello.wav"
      }
    ]
}

And response will be like this:

{
"code": 0,
"status": 200,
"data": "accepted",
"request_id": "9976f46a-243f-4fc6-a441-ebfda38cfad4"
}

What happens above, is this:

  1. The request is accepted, and is given an ID, which is sent back to your application
  2. System will make a SIP call to 123456@192.168.1.1
  3. When other side picks up the phone, it will play the voice file from URL: http://server.domain.com/voices/hello.wav
  4. Aafter file is fully played, the system will hang up.

Now, how do you know if the call was successful or not? At the time API returned request ID, the call was yet to be established and the outcome was not known yet. For this, let's use the given request ID and craft another request to check what happened to the call:

GET /voice/call/status/9976f46a-243f-4fc6-a441-ebfda38cfad4


Note, this is GET request, and therefore has no data payload. We use request ID received in the previous step in the URL.

Depending on the outcome of the call, the response will be something like this:


{
"code": 0,
"status": 200,
"data": "accepted",
"request_id": "9976f46a-243f-4fc6-a441-ebfda38cfad4",
"record":
{
"type": "voice",
"result": "completed",
"request_time": "2018-05-01 13:35:32",
"to": "sip:123456@192.168.1.1:5060",
"from": "me",
"setup_time": "2018-05-01 13:35:32",
"connect_time": "2018-05-01 13:35:45",
"disconnect_time": "2018-05-01 13:36:02",
"disconnect_cause_code": "200",
"disconnect_cause_text": "OK",
"duration": 17,
"answered": true
}
}
 

So, the above response tells us that the call was successful, and lasted 17 seconds, i. e. duration of the recording. Note, that if you send status request early while the call is not yet established or is in progress, you would get different output, and some fields would not be present at all (like disconnect time, cause, and duration). Also in case if user did not pick up the phone, there would be different set of response attributes (i. e. connect time would not be present).

Now, let's see a case where you want to send a call with more files, and play them one after another. Here is the example request:


POST /voice/call/play

{
"to": "sip:123456@192.168.1.1:5060",
"from": "me",
"playlist":
[
{
"play": "http://server.domain.com/voices/hello.wav"
},
{
"play": "http://server.domain.com/voices/main-message.wav"
},
{
"play": "http://server.domain.com/voices/goodbye-thanks.wav"
}
]
}

The difference from previous post is that instead of one entry in the “playlist”, there are multiple. They will be played in the sequence, as they appear in the request, one by one, i.e. “hello.wav”, “main-message.wav” and “goodbye-thanks.wav”.

Now, let's say you need to add some dynamic text in the message you play for which you do not have a prerecorded file. In this case, you can include a Text-to-speech item in the “playlist”, like this:

POST /voice/call/play

{
"to": "sip:123456@192.168.1.1:5060",
"from": "me",
"playlist":
[
{
"play": "http://server.domain.com/voices/hello.wav",
"type": "remote"
},
{
"play": "http://server.domain.com/voices/main-message.wav",
"type": "remote"
},
{
"play": "Punctuality is the virtue of the bored",
"type": "tts",
"options":
{
"language": "en-US",
"gender": "female"
}
},
{
"play": "http://server.domain.com/voices/goodbye-thanks.wav",
"type": "remote"
}
]
}

As you see from above example, we have added a “type” attribute to the playlist items. It is designed to distinguish between files stored on remote servers (hence “remote”) and texts requiring conversion from text to speech (“tts”). The TTS type also has a possibility to pass options, like language and gender of the speaker.

API can also serve voice files from locally stored locations. That allows faster call establishment times, less bandwidth and increased availability. To achieve that, first upload the required file via API call:


POST /storage/media

Content-Type: audio/wav
Content-Length: 2845

raw audio file content


The server will respond with something like this:

{
"code": 0,
"status": 200,
"data": "accepted",
"request_id": "cc078956-f4a6-4494-b418-f0facc2f1203",
"file":
{
"id": "9ba994ea-a0b5-47cb-af69-fb0a235d7b19",
"size": 2845,
"type": "audio/wav"
}
}

Once the all necessary files are uploaded, we can send our call request like this:


POST /voice/call/play

{
"to": "sip:123456@192.168.1.1:5060",
"from": "me",
"playlist":
[
{
"play": "9ba994ea-a0b5-47cb-af69-fb0a235d7b19",
"type": "local"
},
{
"play": "http://server.domain.com/voices/goodbye-thanks.wav",
"type": "remote"
}
]
}

Note, that we use ID from “file” object, not the request itself.

In the next article we will see how to send calls to multiple users at the same time and how to use more advanced call routing features.

Share this article

Aivis Olsteins

Aivis Olsteins

An experienced telecommunications professional with expertise in network architecture, cloud communications, and emerging technologies. Passionate about helping businesses leverage modern telecom solutions to drive growth and innovation.

Related Articles

Case Study: Global Communications Company

Case Study: Global Communications Company

A leading communications company used our cloud Voice platform to send 30 million OTP calls per month to their customers, resulting in cost reduction and incrased conversion

Read Article
Bridging The Delay Gap in Conversational AI: The Backpressure Analogy

Bridging The Delay Gap in Conversational AI: The Backpressure Analogy

Conversational AI struggles with the time gap between text generation and speech synthesis. A “backpressure” mechanism, akin to network data flow control, could slow text generation to match speech synthesis speed, improving user interaction.

Read Article
How Voice AI Agents Can Automate Outbound Calls and Unlock New Opportunities for Businesses: A Deeper Dive

How Voice AI Agents Can Automate Outbound Calls and Unlock New Opportunities for Businesses: A Deeper Dive

AI voice agents transform healthcare scheduling by reducing costs, administrative tasks, and no-shows. They offer 24/7 service, multilingual support, proactive reminders, and valuable insights, improving efficiency and patient experiences.

Read Article
How to Fix Your Context: Mitigating and Avoiding Context Failures in LLMs

How to Fix Your Context: Mitigating and Avoiding Context Failures in LLMs

Larger context windows in LLMs cause poisoning, distraction, confusion, and clash. Effective context management (RAG, pruning, quarantine, summarization, tool loadouts, offloading) remains essential for high-quality outputs.

Read Article

SUBSCRIBE TO OUR NEWSLETTER

Stay up to date with the latest news and updates from our telecom experts