It’s easy to take Siri, Cortana, Alexa or Google Assistant for granted; a lot of work has gone into making the technology work accurately and seamlessly. I use it all the time to play music or call home when I’m driving. Voice-enabled interactive applications have become a feature that more and more consumers rely on in their day-to-day lives. In just the past year alone, the market for smart speakers has grown 128% in the U.S.
To capitalize on this growing market, service providers are looking to add speech-enabled applications and services capabilities to their networks as well. For the technology adoption to continue to grow at a rapid pace, quality of experience, responsiveness, and streamlined user interfaces remain critical factors.
While the devices – and speech interfaces in general – are now nearly ubiquitous, nearly is the operative word. One of the gaps that communications service providers are well positioned to address is in-call speech enablement. The traditional approach of leveraging a server-based natural language technology is problematic for growth since a one-size-fits-all solution can be overkill (from cost, complexity, and performance perspectives), making many new applications cost prohibitive.
Challenges with Delivering Speech-Enabled Services In-call
- Complexity – requires maintenance of additional external network elements
- Cost – operational and capital costs for external network elements
- Quality – callers on the go and with unpredictable network conditions can degrade recognition accuracy without quality-optimized inputs
At Radisys, our MediaEngine virtual media server platform offers an approach that overcomes the challenges associated with the traditional approach to in-call speech-enabled applications. The Radisys MediaEngine supports:
- Media optimization to correct impairments in media introduced by network conditions or call quality
- Integrated – wake word detection – a small footprint, low cost solution to help reduce the cost and hardware requirements by triggering natural language / transcription speech processing only when invoked by a command word or phrase
- Support for numerous third-party natural language engines that are server based, in-network, or via cloud (e.g. Google, IBM Watson etc.)
Now developers can tap a range of best-in-class tools to leverage the right technology (or mix of technologies) tailored to their exact application requirements. And more sophisticated applications that combine speech recognition with other capabilities like collaboration, video bot interactions, recording, etc., can drive more value for users, greater brand leverage, and service stickiness.
These enhancements now enable a much wider range of applications – limited only by the imagination, not the cost. Service providers can deploy a number of innovative solutions, including enhanced customer service, application (like conference command) navigation, or develop new revenue streams by using speech analytics to build actionable marketing data.
Consumers may take interactive speech applications for granted, but there’s no reason service providers should. With the continued drive toward more speech-enabled devices and services, the enhancements to MediaEngine’s integrated speech recognition capabilities continue to provide service providers with the industry-leading virtual media server platform.
For more information about the integrated speech recognition solutions available in MediaEngine, click here, or contact me at al.balasco@radisys.com to discuss how Radisys can deliver the right media processing solution for your voice-enabled applications and services.
About the Author