Building script driven Telephony engine
Back to Articles
News SIP API Javascript IVR

Building script driven Telephony engine

February 18, 2018 10 min
Aivis Olsteins

Aivis Olsteins

Ever since we started to offer IVR for calling card and callback services back in 2004 or so, it became a never ending process of adding new features, changing behavior, creating one or another additional optional feature and sometimes rewriting all from the scratch. The requirements of the users varied significantly, we have to create number of configuration files, then later configuration screens so our clients could set their system in the way they preferred. And it all, of course was reasonable: sometimes you want system to announce your balance, and sometimes dont, sometimes- remaining time (hours + minutes, or only minutes), then somebody wants to autosave their caller ID so the user does not need to enter PIN when they call next time. Again, you need to provide option to erase the caller ID from DTMF in case they used public phone. Then there were requests to provide shortkeys for speed dialing, balance retrieval, recharge, language change, redial and so on. List would go forever if not the decline of calling card market as such.

However, the story did not stop there. we got requests to create specialized IVRs which were not only related to calling cards. There were IVRs for inbound call centers, customer support, various automated answering systems and so on. While we didn't have to build each case from scratch, it still required building a new module, which at the end was relatively slow process.

Later came idea to build a universal, GUI base system which would allow non technically minded people to put together they behavior they needed by clicking, dragging and dropping on the screen. While that theoretically works, there are couple of drawbacks to that approach:

First, it is still quite hard for non technical person to understand how to put it together. When I can play the IVR prompt, how to get the key press string from DTMF and do conditional tasks, etc.

Second, to build a simple script, you need a lot of building blocks. At the end the diagram becomes extremely large, and you cannot get all picture. Larger system design becomes nearly impossible.

So at the end we came to the conclusion that the best approach would be to go back to the text editor and console, and let people write their scripts. Take a programming language, easy enough and popular so that most technically literate could understand the syntax and logic, build the parser for it, add domain-specific functions and classes, and you have quite powerful tool in your hands.

First, what language to choose. After some thinking, choice fell on JavaScript, due to its popularity, availability, easy to understand structure and large range of ready made parsers. Consider this small snippet:

 

playMedia({
  source:"welcome_leave_message.wav"
});

recordMedia({
  target:"recording.wav",
  maxlen:120,
  stop:"#"
});


Its pretty easy readable, and you can immediately get an idea: system will play a prerecorded greeting, and then will allow to record a message which is at maximum 2 minutes long or can be terminated by pressing pound sign #.

Standard constructs of the language can be used, like assignments, string functions, flow control. For example, detect language from caller ID prefix:

 

if (caller_id.startswith("34")) {
  lang = "es";
}



Using some of the tools available for JavaScript parsing, we can get relatively straightforward structure of the expressions. For example, if parsing above snippet with Esprima, we get following:


{
            test: {
                callee: {
                    property: {
                        type: "Identifier",
                        name: "startswith"
                    },
                    object: {
                        type: "Identifier",
                        name: "caller_id"
                    },
                    type: "MemberExpression",
                    computed: False
                },
                type: "CallExpression",
                arguments: [
                    {
                        raw: "\"34\"",
                        type: "Literal",
                        value: "34"
                    }
                ]
            },
            type: "IfStatement",
            consequent: {
                body: [
                    {
                        expression: {
                            operator: "=",
                            right: {
                                raw: "\"es\"",
                                type: "Literal",
                                value: "es"
                            },
                            type: "AssignmentExpression",
                            left: {
                                type: "Identifier",
                                name: "lang"
                            }
                        },
                        type: "ExpressionStatement"
                    }
                ],
                type: "BlockStatement"
            }
        }

It actually creates a JSON which structurises the entire script. It can further be fed into state machine, which along with the signals from telephony engine can process it block by block and issue control messages of the call flow.

It is not purpose of this article to go into details of this process, however, we have successfully created prototype which is able to do as described above. We have added several telephony specific classes, for example:

  1. collect DTMF
  2. play media
  3. record media
  4. send SMS

also control flow structures:

  1. if else
  2. while

I will share more details of the project in the next posts. In the meantime, if you are interested to become beta testers, please ask us via our Contact Form with your specific case.

 

Share this article

Aivis Olsteins

Aivis Olsteins

An experienced telecommunications professional with expertise in network architecture, cloud communications, and emerging technologies. Passionate about helping businesses leverage modern telecom solutions to drive growth and innovation.

Related Articles

Case Study: Global Communications Company

Case Study: Global Communications Company

A leading communications company used our cloud Voice platform to send 30 million OTP calls per month to their customers, resulting in cost reduction and incrased conversion

Read Article
Bridging The Delay Gap in Conversational AI: The Backpressure Analogy

Bridging The Delay Gap in Conversational AI: The Backpressure Analogy

Conversational AI struggles with the time gap between text generation and speech synthesis. A “backpressure” mechanism, akin to network data flow control, could slow text generation to match speech synthesis speed, improving user interaction.

Read Article
How Voice AI Agents Can Automate Outbound Calls and Unlock New Opportunities for Businesses: A Deeper Dive

How Voice AI Agents Can Automate Outbound Calls and Unlock New Opportunities for Businesses: A Deeper Dive

AI voice agents transform healthcare scheduling by reducing costs, administrative tasks, and no-shows. They offer 24/7 service, multilingual support, proactive reminders, and valuable insights, improving efficiency and patient experiences.

Read Article
How to Fix Your Context: Mitigating and Avoiding Context Failures in LLMs

How to Fix Your Context: Mitigating and Avoiding Context Failures in LLMs

Larger context windows in LLMs cause poisoning, distraction, confusion, and clash. Effective context management (RAG, pruning, quarantine, summarization, tool loadouts, offloading) remains essential for high-quality outputs.

Read Article

SUBSCRIBE TO OUR NEWSLETTER

Stay up to date with the latest news and updates from our telecom experts