Speech Recognition Technologies Gain Voice In New Apps

Bạch Hưng Nguyên
(NguyenBH)

Thành viên danh dự
Speech Recognition Technologies Gain Voice In New Apps
May 2, 2003 (5:33 p.m. EST)
By Gregg Keizer, TechWeb News

Speech recognition, the long sought-after technology of translating talk into type, is finally gaining traction.
Among those talking up the renewed value of speech recognition is Bill Meisel, president of TMA Associates, a research firm that specializes in speech recognition. He has a theory explaining the marked increase in the volume and number of speech recognition product and technology announcements.
“The downturn in capital spending, particularly by the telecommunication industry, created a kind of plateau for speech for a while," he said. "But now everyone's playing catch-up. Every call center is looking at adding speech technology, and those that have it are looking at expanding it.”
One major signal of increased business activity in the speech sector is a major merger of two rivals. Just last week, one of the leaders in U.S. speech recognition, ScanSoft, announced it would purchase rival SpeechWorks in a stock deal worth $132 million.
“ScanSoft gets a sweet spot in the interactive telephone market” from SpeechWorks' concentration in the interactive voice response market that caters to call centers, said Meisel. “And the acquisition gives SpeechWorks' technology a global outlet.”
Additionally, ScanSoft and IBM announced a set of agreements that will let the pair extend speech recognition across various server and desktop platforms.
And as part of that deal, IBM will help ScanSoft port its telephony applications, VoiceRequest and DirectoryAssistance, to the WebSphere platform. The port will insure that the ScanSoft apps are compliant with the VoiceXML 2.0 standard and integrate with IBM's own voice portal, WebSphere Voice Application Access.
Companies such as IBM, said Meisel, have a vested interest in boosting speech's presence in the enterprise, since speech requires a huge increase in computational capacity. The more computing horsepower needed by customers, the more likely they will buy newer and bigger servers.
Other agreements struck between ScanSoft and IBM expand the latter's licensing deal for ScanSoft's RealSpeak text-to-speech technology, and allow ScanSoft to distribute and support the desktop editions of IBM's own speech recognition software, ViaVoice.
ScanSoft's Dragon NaturallySpeaking and ViaVoice complement each other, the two companies said, because ViaVoice is available in a Macintosh version and has more extensive support for foreign languages, including Chinese, German, Spanish, and Japanese.
ScanSoft, which acquired the popular Dragon NaturallySpeaking speech recognition platform from Lernout & Hauspie in December 2001, shipped an updated Legal edition of the program on Wednesday.
Based on NaturallySpeaking 7, a general-purpose speech recognition package that released in early March, Legal targets law firms and professionals, who along with medical workers, are among the most avid users of speech recognition software. The $995 program -- which is also available in multi-seat and site licensing packages -- includes a 250,000-word vocabulary complete with legal-specific terms, French- and Latin-based law phrases, court names, and legal abbreviations. Version 7 boasts an accuracy rate of close to 99 percent, claimed ScanSoft.
Intel, meanwhile, recently unveiled open-source software that will allow developers to integrate a "read my lips" technology to make speech recognition even more accurate. On Monday, Intel released its Audio Visual Speech Recognition (AVSR) software that lets computers monitor a speaker's facial features and track their mouth movements.
In noisy environments, such as public places or crowded, open offices, the technology -- when paired with a computer-controlled video camera and speech recognition software -- can reduce errors by as much as 55 percent, according to Intel.
The AVSA software includes C source code, and is royalty-free for re-distribution. It can be downloaded from Intel's Web site.
Also on Monday, SpeechWorks released version 2.0 of its SpeakFreely platform, an integrated set of tools that improves call-in recognition of natural language, open-ended queries to customers over the phone. SpeechWorks' partners can deploy SpeakFreely without modifying their telephony infrastructure, and add the capability to recognize such dialog as, “What kind of difficulty are you having?” when callers dial in to support or customer service centers. The suite includes SpeakFreely Tools, a speech recognition engine, and a grammar compiler.
Meisel sees speech becoming even more important down the road, though not in the “Star Trek” mode of talking to your computer. Instead, he predicts that speech technologies will turn the mundane telephone into a Web-style information resource.
“The next big deal is the evolution of the telephone,” he said. “The interactive style [of information retrieval] that you can get on the Web will come to the phone. Navigating around a broker's site can be a fairly lengthy experience. But you'll be able to call up the broker, say 'IBM' or Motorola' or whatever, and get the quote you're after.
“There's no reason why you can't do on the phone what you can do now on the Web,” he added.
============
Tôi post bài này vì hiện giờ đang quan tâm đến lĩnh vực này.
Nguyen
 
Back
Bên trên