MobilED server documentation
MobilED Platform Documentation - Version 1.0
Prepared by: Francois Aucamp Meraka Institute CSIR Date: 31 March 2006
Introduction
This document describes the architecture, layout and implementation of the current MobilEd? platform. It further provides information on the usage of the platform as well as information on possible future developments.
The MobilEd? platform employs three main technology platforms to achieve its goal:
* SMS communication interface/gateway, such as Kannel (http://www.kannel.org) or Alamin (http://www.alamin.org/) to send and receive SMS's, * the Asterisk Open Source PBX (http://www.asterisk.org/) for audio telephony communications, and * a MediaWiki? (http://www.mediawiki.org/) server with suitable content, such as en.wikipedia.org.
This report mainly focuses on the software that was developed to combine these technologies.
Goals
The goal of this platform is to enable the usage of a public information source, such as the Wikipedia on-line encyclopedia, via a audio-based mobile device, such as a cellular telephone. This is accomplished by making use of speech-based technologies such as text-to-speech (TTS) engines and interactive voice response (IVR) applications. Users of the system will be able to contribute data to the information source, by recording and uploading audio clips via their telephones.
This will enable communities with little or no conventional access to the Internet to interact with and gain knowledge from state-of-the-art information sources.
Design and Architecture
Use cases
A typical high-level use case of the system is provided in Figure 1, below.
http://www2.uiah.fi/~tleinone/mobiled/figure_01.png
Figure 1: Simple high-level usage scenario (user's perspective).
The user of the system uses the standard text messaging capability of his/her cellphone (SMS) to request an article from the MobilEd? platform, by entering the title of the requested article, and sending the message to the phone number assigned to the the MobilEd? platform implementation (this is installation-specific).
The MobilEd? platform responds by calling the user and delivering the requested article via an audio telephone call. This is accomplished using a platform for interactive voice response applications, and human language technology components such as a text-to-speech engine. It reads the article to the user using a computer-generated voice in combination with pre-recorded voice prompts, and accepts input from the user via DTMF/Touchtone key presses. The user is also allowed to contribute to the information source (local MediaWiki? server) by recording his/her voice over the telephone.
A more detailed use case diagram, focussing more on the systems and processes employed by the MobilEd? platform itself, is given in Figure 2.
http://www2.uiah.fi/~tleinone/mobiled/figure_02.png
Figure 2: Use case diagram (platform perspective).
Implementation
The core of the MobilEd? platform is the Asterisk Software PBX (http://www.asterisk.org). It is responsible for all telephone call-management, and the speech-based article reader presented to the user over the audio telephone call is an application that runs on top of the PBX. All other communication, such as HTTP requests/posts to the MediaWiki? server and SMS communication, is described relative to the Asterisk PBX.
All of the required source code for this project was developed in Python, except where otherwise indicated. The nature of the requirements of the developed code implicated that a scripting language, such as Python, is most suited for the solution, due to the required interconnectivity between different technology platforms and ease of debugging and maintenance of the code.
SMS communication
A GSM modem (or similar device, such as a cellphone connected to a PC), as well as an SMS gateway application capable of executing external scripts/programs is required for SMS communication. The gateway is solely responsible for all external communication with the GSM network; internally, it must simply execute a Python script, providing it with the number of the user's cellular phone, and the contents of the SMS, i.e. the name of the requested article. For the MobilEd? prototype, the Alamin SMS gateway was used, due to the simplicity of its setup.
The script that is executed by the SMS gateway formats the cellphone number provided by the gateway, so that it is a valid local number (i.e. it removes the +27 international dialing code for South Africa), and it adds a leading 0 to the number - this was done because the (physical) PBX systems in use at the CSIR require a leading 0 to any number to request an outgoing line. In a different deployment scenario, this will have to be modified.
After formatting the user's telephone number, the script initiates a TCP/IP socket connection to Asterisk, which may be running on a different computer. It uses the Asterisk Manager API to originate a "virtual call" on the Asterisk Console Channel, which informs Asterisk to run a specific dialplan, and thus execute the main component of the MobilEd? platform. This is discussed in section 3.2.2
Asterisk dialplan and the MobilEd? AGI interface
The main component of the MobilEd? platform, which is responsible for all communication with the MediaWiki? server as well as presenting the user with the necessary IVR menus for navigation, is implemented as an Asterisk Application Gateway Interface (AGI) program. It is executed by Asterisk, and communicates with Asterisk to provide the user with an interactive, voice-based version of the information it retrieves from the MediaWiki? server.
Asterisk uses dialplans internally to direct its callflow; these dialplans are created using an Asterisk-specific configuration language, or can be created using one of several tools available, such as the CSIR's DialogPalette? (http://dialogpalette.sourceforge.net).
The Asterisk dialplan developed for this project consists of two main parts:
* The initial "virtual call" dialplan: This dialplan instructs the MobilEd? AGI application to download the requested article from the MediaWiki? server. After the article is downloaded, the dialplan schedules an outgoing telephone call to the user. * The outgoing dialplan: This is the dialplan that is executed when the user answers the Asterisk server's outgoing call (scheduled by the initial dialplan). It again launches the MobilEd? AGI application, but this time instructs it to read the article, not download it. This is done in this 2-step fashion so that the user does not need to wait for the article to be downloaded after answering the phone call.
The file extensions.conf in the /etc/asterisk directory (by default) of the Asterisk server contains the Asterisk dialplan(s) to be used. The dialplans used for the MobilEd? pilot are optimized for the hardware that was used during the tests (a Digium Wildcard TDM400P/TD04B); they make use of analog telephone interface channels (e.g. incoming/outgoing telephone lines) via the Zaptel driver interface, and poll the telephony interface card for available channels as needed. If no channel is available (i.e. all the available channels are in use), it places the outgoing call request in a queue.
MobilEd? application structure
The main MobilEd? AGI application consists of four major components: * An Asterisk AGI interface component, * a Wikipedia article management component, * an article parser and reader component, and * the main functional section of the program, including IVR menu structures.
Figure 3 provides a high-level overview of the MobilEd? application layout, and how it interconnects to the different technology platforms.
http://www2.uiah.fi/~tleinone/mobiled/figure_03.png
Figure 3: Component layout of MobilED AGI application.
These components are discussed in detail in the following subsections.
Asterisk-MobilEd? AGI interface
In order to communicate with Asterisk, the MobilEd? AGI application wraps the basic Asterisk AGI functionality required for this project in an API contained in a single class called agiWrapper.
The agiWrapper class defines functions for executing generic Asterisk instructions, as well as reading and interpreting Asterisk's responses. Some often-used specific functionality is included in the API as well, such as methods allowing text to be spoken aloud as speech, via a text-to-speech engine, as well as getting DTMF input and playing audio files.
The agiWrapper class is solely responsible for all communication to and from the Asterisk server; without it, the application would have no contact to Asterisk, and thus no way of manipulating a telephone call. It specifies how operations in Asterisk are to be performed, but not what is to be performed - this is done by the main functional section, discussed in section 3.2.3.4.
Wikipedia article management
This component is responsible for all communication with the MediaWiki? server; it handles article downloads and information uploads, and is responsible for maintaining a local article cache. It is contained in a class, named wikipediaManager.
The primary function of this class is the retrieval of articles, wrapped by the class's get(ArticleName?) method. It uses the MediaWiki? sever's XML-based article export ability to download an article source XML document, which is then parsed and the article text extracted. After a successful download, the article text is formatted via the _formatArticle() method, which strips the article of all MediaWiki? markup, embedded images, links, etc, and prepares the article for processing by a text-to-speech engine. During this progress, an index is built up of all the sections contained in the article text (as indicated by the MediaWiki? markup), which is prepended to the text itself - this index is later used by the real-time article reader component, discussed in section 3.2.3.3. The method also checks the contents of the text for redirection markers; if any are found, the text formatting is cancelled and the target article (of the redirection) is downloaded.
After the text formatting/indexing is complete, the article is added to a local cache of articles in plain-text format for later use.
Uploads to the MediaWiki? server is currently done through an external Python script, called mediawiki_upload.py - the functionality of this script should also physically be implemented in the wikipediaManager class; its functionality, and thus itself, is considered to be a part of the Wikipedia article management component. It is used to upload user-added audio recordings to the MediaWiki? server and adding the necessary related article text links, and is initiated in the MobilEd? application by the main functional component, discussed in section 3.2.3.4. It uploads files by employing an HTTP-bot-like mechanism, which "fills-in" the required data on the actual MediaWiki? server's pages via HTTP POST/GET actions.
Article parser and reader
This component loads the formatted article text (as provided by the wikipediaManager class, discussed in section 3.2.3.2), and provides an API for navigating quickly through the article. This functionality is wrapped into the articleReader class.
The articleReader class provides, on demand, sentences from the current article (and article section); internally, it keeps track of the position in the article text that the user is currently "viewing" (i.e. listening to via TTS). It breaks the article up into sections (as specified by the wikipediaManager class's _formatArticle() method), and provides methods for retrieving article sentences one-by-one, in order, either backwards or forwards. Article sections are opened in order also as needed, but can be explicitly specified.
The articleReader class allows the appending of attachments to any section of the article - this is done by creating and maintaining an index of data elements added to the article through this interface; this data index is only present in the locally-cached wikipedia articles; the articleReader class is not responsible for the synchronization with the MediaWiki? server - this is done by the Wikipedia article management component, as described is section 3.2.3.2. In other words, two uploads are done for every recording made; one to the article reader, which stores the audio as a waveform (as required by the Asterisk server for playback over the telephone), and one to the MediaWiki? server as an Ogg Vorbis-encoded audio file (as required by the project specifications). Asterisk itself does not support the playback of Ogg Vorbis-encoded audio files.
Main functional section
The core of the MobilEd? AGI application consists of several functions, each describing a different IVR menu (such as the indexMenu() and endOfPageMenu() functions), and some procedural code. This is what is executed when the application is launched, and is responsible for initializing all of the other three components and starting the main event loop via the readArticle() function.
It coordinates the call flow, by sending/receiving messages to/from Asterisk through the AGI interface component, and executes the correct instructions based on information received from the article reader and wikipedia manager components. This provides the IVR prompts, menus and interactivity that the user encounters while in a call.
Text-to-speech
The text-to-speech engine used in the project is the Flite text-to-speech engine. Flite is a smaller, faster, runtime version of Festival, but Festival may also be used; this is set in the in MobilEd? AGI application, by setting the TTS_APPLICATION to "Festival".
In order to use Flite with Asterisk, an add-on expansion module needs to be installed; this module was developed at the CSIR, and can be downloaded from the DialogPalette? website: http://dialogpalette.sourceforge.net
Installation overview
Currently, no formal installation process exists. Due to the nature and programming language of the developed software, packaging thereof in a standard format such as RPM or debian packages should not present too many problems.
What needs to be addressed, is the configuration and localization of the project; currently, all server information (such as the local MediaWiki? server) are hard-coded, but since the program is written in Python, changing the actual script using a text editor is sufficient, and it is doubtful if an external configuration file is necessary.
Asterisk needs to be installed and configured to run on the telephony hardware that is installed - this is dependent on the actual hardware itself. The dialplan (extensions.conf file) developed for MobilEd? will have to be configured to make use of the appropriate channels (and number of channels) also, and copied to the /etc/asterisk directory. The Flite TTS module must be downloaded and installed.
A MediaWiki? server must be set up, and configured to allow for the uploading of Ogg Vorbis audio files. The MobilEd? AGI script (wikipediareader.py) must be modified to point to this server, by setting the WIKIPEDIA_EXPORT (for the global Wikipedia.org, if required) and LOCAL_WIKIPEDIA_EXPORT (for the local MediaWiki? server).
An SMS gateway must be installed and set to run the Python script mobiled-asterisk-client.py with the correct parameters as discussed in section 3.2.1. Alamin is a good example of such a gateway, but another, such as Kannel, will also work.
Source code
The relevant source files are available here: