The Persona project at Microsoft Research was an attempt to advance the state-of-the-art in user interfaces by making them more engaging, assistive, and natural. Early interfaces were command-line based, and then transitioned to the WIMP (Windows, Icon, Menus, Pointer) interfaces that are still prevalent today. In the Persona project, we aimed to develop interface technology that would open up computers to the large number of users who are still intimidated and confused by WIMP interfaces, by creating a 3D animated agent that would respond to real conversational requests.
The numerous components of Persona included:
- Speech recognition - based on the Whisper speech recognition system developed by the speech group at Microsoft Research
- Natural language understanding - based on the MS-NLP project, also developed at Microsoft Research. This component would use rules to convert natural language to a canonical semantic representation.
- Semantic template matching - to match the canonical semantic representation to application-related intent.
- Dialogue modeling - to enable the agent to more naturally and flexibly respond and query in a sequence of natural language statements.
- Animation sequencing - the Player (and earlier ReActor) subsystems controlled the 3D animation of the agent.
- Speech control - to provide natural-seeming verbal output from the agent. We experimented with both text-to-speech and voice-talent supplied speech, and chose the latter for the vastly improved realism. Lip synchronization was calculated by running this audio through the speech recognition component.
For more information on the components, and how they worked together, please read the chapter from the book, "Software Agents", cited below.
The sequence of animations that could be performed at any given time was intially specified as a state machine. As the universe of the agent's actions grew, it became more and more complex and error-prone to specify this sequencing. The reason for this is that each of the multitude of animation action scripts could only be executed in a very specific context. For example, if the agent were in a deep sleep state and received a request to search for a particular tune in the library, the agent would need to wake up and stand first. Otherwise, the resulting animation would be totally wrong. DJ Kurlander's particular role on the project was to develop the new animation sequencer. To do so, he developed an Artificial Intelligence (AI) planning-based specification, in which all animation scripts included preconditions and postconditions. Since AI-type planning can be a slow process, DJ had a precompilation step in which all the planning was done up front and converted into a state machine that would execute efficiently at run-time. This work is described in his CHI '95 paper.
The Persona project was successful in that it motivated research that helped advance the state-of-the-art in animated agent-based research. However, Persona was not successful in delivering an interface that everybody could use with ease. One of the main difficulties was in creating comprehensive enough dialog management, and the extent of this challenge was uncovered in some wizard-of-oz studies. These challenges and general lessons learned are described in the Imagina '98 paper cited below. |