sonify.org > tutorials > other > voicexml
Also see VoiceXML for Audio Producer
VoiceXML is an emerging industry standard for providing web content and services through the telephone. This includes information, entertainment, games and business services.
This article introduces the voice services industry, potential voice applications, and VoiceXML architecture. According to the Kelsey Group estimates, there is a huge market opportunity in voice applications. VoiceXML is the right standard that enables developers to take advantage of that market opportunity. It enables rapid development, ensures the portability and leverages existing Internet infrastructure, making it compelling for developers to adapt VoiceXML to create voice applications.
Given that everything in VoiceXML depends on sound, it's a natural area for audio developers to leverage their expertise in audio quality, production techniques and efficient use of bandwidth. For the web developer VoiceXML is based on XML (which is very similar to HTML) so the migration is relatively easy and the coding opportunities are abundant.
Voice Services Industry
Traditional Interactive Voice Response (IVR) applications have been deployed in enterprises for decades, but theyve faced serious limitations including poor usability and the inability to go beyond providing access to proprietary information.
Over the last two years, the need to access Internet content from anywhere gave rise to wireless and voice recognition-based applications, as well as Internet appliances. Some of these applications are available only on special devices such as WAP phones and on Internet appliances such as WebTV.
However, the only device needed to access voice services is an ordinary telephone. According to IDC, the installed base of telephones (both wireless and landline) in 2000 was about 1.6 billion. Hence, the telephone is a ubiquitous device that can play a key role in providing access to web content and services. The success of this "Voice Web" is dependent on having robust voice recognition software and significant computing resources. In the last four years, voice recognition software has proved itself to be viable for commercial applications with improvements in algorithms. Another favorable factor is the availability of faster microprocessors and decreasing prices.
IDC predicts that the total number of telephones, both wireless and landline, will grow to 3.1 billion worldwide by 2005. The Kelsey group estimates that by that time about 440 million people will use voice applications and that the market for speech applications will reach $4.5 billion.
Potential Voice Services
The following categories have the potential to use voice applications to provide content and services through the telephone. Many of these applications can be linked together to offer a comprehensive set of services.
Voice Application Benefits
Voice applications offer several benefits that include the following:
Role of VoiceXML Standard
The developer community has an opportunity to take advantage of the huge voice applications market. Standards-based technologies are vital to developers in order to ensure portability across vendors and to leverage existing Internet infrastructure. VoiceXML is well on its way to becoming the standard that fulfills these needs. The VoiceXML Forum is the industry organization that represents the VoiceXML user community. The World Wide Consortium (W3C) accepted the VoiceXML 1.0 Specification as the "candidate recommendation." The 2.0 Specification is in the preliminary draft stage.
The benefits of developing in VoiceXML include:
Several vendors have implemented the VoiceXML 1.0 Specification along with their own extensions. The BeVocal Café is one such VoiceXML development platform, and was recently ranked the #1 VoiceXML development environment and hosting service by independent testing firm, CT Labs. The Café is a free web-based environment that provides developers with all of the tools and resources necessary for VoiceXML development. The Café offers a hosting platform that provides the telephony infrastructure, voice recognition software and other associated software components to enable VoiceXML applications to run. The Cafe also provides 24/7 operations support. Developers can take advantage of these platforms to avoid the large capital expenditure and effort related to infrastructure build-out.
VoiceXML is an XML data type definition (DTD) defined specifically for voice applications. The 1.0 Specification document details all the tags that are part of this DTD. The specification also deals with the architectural model for VoiceXML implementations, form interpretation algorithms, and the scope of VoiceXML.
The graphic shows the architectural model of VoiceXML in the BeVocal platform. There are three components: a web server, the VoiceXML interpreter context, and the implementation platform. The web server in the graphic can be any web server on the Internet. The interpreter context contains the VoiceXML interpreter, which is responsible for interpreting VoiceXML code. The interpreter context provides all supported functions that are necessary for the interpreter.
The VoiceXML interpreter sends parameter values to the web server as part of the request and it receives a VoiceXML document as the response. The web server receives requests and sends responses back to the interpreter. Any server side scripting language such as Perl, ASP, JSP, and PHP can be used to create VoiceXML documents dynamically.
The VoiceXML interpreter and the VoiceXML interpreter context work with an implementation platform that has other infrastructure components such as a telephony switch, voice recognition software, and a speech synthesis engine (TTS). This implementation platform is responsible for connecting to the Public Switched Telephone Network (PSTN), performing voice recognition, playing audio files, and other supporting functions. Since the implementation platform provides voice recognition capabilities, the details regarding voice recognition are hidden from VoiceXML.
VoiceXML is independent of the implementation platform on which an application might be developed or deployed. This offers flexibility to developers as they are not restricted to one implementation platform.
A VoiceXML Example
The VoiceXML DTD has 47 tags in the 1.0 Specification. Each tag has a set of valid children and parents. A tag also has a set of attributes, through which the tags behavior can be controlled. For a complete list of tags, please refer to the VoiceXML 1.0 Specification. For a comprehensive VoiceXML reference, you can visit http://cafe.bevocal.com/docs/vxml/index.html.
This article makes an attempt to introduce a few important VoiceXML tags through an example. In this example, we illustrate a simple form that takes user input and invokes another form once there is a match.
The illustrated tags include: <vxml>, <form>, <field>, <block>, <prompt>, <goto>, <break>, <audio>, <if>, and <else>.
The first couple of lines of the above source code are specific to XML. The <xml> tag identifies the version as 1.0. The second line has a URL for the DTD. This example uses the BeVocal Café DTD.
VoiceXML code is enclosed in <vxml> and </vxml> tags. The <vxml> tag identifies the version of VoiceXML Specification. It is set to 1.0, as this is the current version.
The <form> tag is used to get user input and perform other associated functionality. In the above example, the ID of the <form> is "welcome." This ID attribute is important as it enables the program control to go back to the same form. In order to play a prompt to the user within the <form> tag, we use the <block> tag. The <block> tag allows you to specify the executable code. We can specify a prompt using the <prompt> tag. A wave file can also be played as a prompt using the <audio> tag. We specify the name of the wave file using the "src " attribute. The VoiceXML 1.0 Specification allows Universal Resource Locators (URLs) to be used as file paths. If the audio file is not found, the VoiceXML Interpreter plays the specified text through the Text-to-Speech (TTS) engine. We used the <break> tag to introduce a pause after playing the prompt.
The steps to gather input and take action depending on the match are as follows:
This article introduced the voice services industry, potential voice applications, and VoiceXML architecture. According to the Kelsey Group estimates, there is a huge market opportunity in voice applications. VoiceXML is the right standard that enables developers to take advantage of that market opportunity. It enables rapid development, ensures the portability and leverages existing Internet infrastructure, making it compelling for developers to adapt VoiceXML to create voice applications.
VoiceXML will interest both web developers and audio producers.
Discuss this tutorial/demo in the Wireless Apps Discussion forum.