Advancements in AI and Voice Technology

In this article, we dive into the technologies offered by DSP Concepts and We take a closer look at how these technologies work together to make better voice-enabled products. We then examine other advancements in machine learning, and the coming opportunities for AI in voice-enabled products. What does all this mean for our products today, and tomorrow? How might advancements in artificial intelligence impact voice product development in the near future?

Solutions needed in Current Voice-Enabled Products
According to a report by Market Research Future, the global market for voice assistants is expected to reach a market value of USD $7.3 billion by the year 2025, representing a CAGR of more than 24%. Split by geography and device type, voice-enabled products must be competitive in performance and adaptable to different languages or geographical regions to capture the most of their respective market segments.

The additional market restraints of high development and production costs and the sometimes challenging integration of voice technology can pose a problem for product makers, who must grapple with the development time needed to create and deploy competitive voice recognition functions while also adapting their designs for different languages and regional accents.

From a consumer standpoint, some major obstacles that one may expect to encounter with voice-enabled products are problematic acoustic environments, latency involved in interpreting and processing commands, and inaccurate voice recognition.
These issues culminate in the need for voice products that are robust to noise, can provide flexible command sets that allow deployment across multiple regions, with a combination of input signal processing features (known as an audio front end, or AFE) that can be scaled to meet physical or cost constraints.

DSP Concepts and Solutions
To meet the demands of voice product development, DSP Concepts provides flexible solutions: the Audio Weaver platform and the TalkTo audio front end. Combined with the edge-based Air automatic intent recognition engine, these solutions comprise a set of tools that empower product makers to deliver noise-robust systems with multilingual support, low latency, and flexible command sets, all while reducing development cost and shortening the time to market.

Audio Weaver Logo

Audio Weaver is a low-code, hardware-independent audio platform that offers tools to streamline the development workflow from prototyping to production. Audio Weaver contains two parts: AWE Designer and AWE Core. With the AWE Designer application, teams can utilize the drag-and-drop interface to craft designs quickly. For final testing, tuning, and production, the designs created in AWE Designer are then deployed to a target product (MCU, dedicated DSP, or SoC) that has been embedded with AWE Core runtime libraries. This dynamic instantiation of audio processing features allows for rapid iteration, and team members can each develop features in parallel before deploying a finished design. Integration with the final product is simplified since each aspect of the design targets specific libraries that already exist on the device.

TalkTo Logo

TalkTo is a customizable audio front end (AFE) that combines advanced signal processing techniques to deliver clean audio signals to voice assistants and speech recognition engines. The extensive signal processing offered by TalkTo can be tailored to meet a wide selection of use cases and desired performance footprints, and multiple microphone array topologies are available to meet the demands and constraints of numerous device form factors. TalkTo features can be chosen to match the processing power of various systems; scaled up to meet the demands of feature-rich, multi-microphone designs, or scaled down to meet the demands of low-power and processor-efficient designs. Air Logo Air is a direct speech-to-intent spoken language understanding system. Featuring edge-based universal language support, Air provides on-device command sets with highly accurate and concurrent understanding of multiple languages and accents, detecting intent without the need to connect to the cloud and without converting speech to text. Like TalkTo, Air is a scalable technology. For the most natural voice user experience, predefined commands can be triggered with variable or synonymous phrasing through use of a “slot” model, where syntax is deconstructed and filtered into action/object/location slots, then mapped against the command set. A “direct intent” model may be adopted for low-power devices utilizing a smaller vocabulary and less variable command phrasing.

Current and Future Approaches to AI’s approach to embedded AI differs from traditional, cloud-based systems. While connection to the cloud does assist certain use cases such as performing a web search via voice, edge-based AI provides a much lower latency, and by nature of being offline is private by design. Where the cloud may offer access to more information and more potential processing power, Air is a system that leverages the benefits of privacy and lower latency while also occupying a smaller processing footprint and providing the user with an intuitive voice UI.

Technologies like Air point forward to a future where AI devices will be intelligent in and of themselves and will no longer need to depend on an internet connection. Aside from the appeal of a more natural interaction with smarter machine listeners, embedded AI demonstrates real-world benefits that incentivize wider adoption. With the processing being performed locally, devices utilizing this type of technology can be deployed with fewer geographical hurdles as there is no requirement for network infrastructure or an ISP. With no reliance on third-party services such as Google or Alexa Voice, device integration is simplified, device usage involves more responsive feedback, and owners of voice devices also retain their own data.

Advancements in machine learning are also paving the way for future voice AI development. Machine learning is important to the future of voice-enabled assistive products, allowing better collection of data from multiple sources or sensors, more useful action, and less conscious intervention by the user. In short, products will become more powerful and more user-friendly.

Tiny Machine Learning (TinyML) is an emerging field of Deep Learning that incorporates software, embedded machine learning, and on-device data analytics. With progress in this field, the future will bring shrinking AI models that take up as little space as possible, enabling smaller devices to become smarter than ever before.

Automated Machine Learning (AutoML) is a field that seeks to automate the processes of machine learning, such as preprocessing of data and feature selection. AutoML can be thought of as AI building AI; where the machine learning process can be automated, and a high degree of expertise is no longer necessary to make use of machine learning models. With this, the future enables AI systems to quickly train and adapt to different technologies and use cases.

Visions of smarter, futuristic voice assistants seem to have some common threads. The machine listeners of the future are almost always imagined to be more conversational and intelligent, with queries and responses that closely match the pace and tone of a human-to-human dialogue. We envision the future as having devices that can interpret and act upon biometric data with an AI that is astutely reactive, detecting vocal inflections or stress and making recommendations or queries based on the user’s disposition. Such devices also employ machine queries and provide responses that mimic conversation, backed up by an ability to recall previous interactions and adapt accordingly, creating what seems to be a rapport with the user. This ability to respond to cues and carry an actual dialogue is a common depiction of futuristic virtual assistants.

Future voice assistants are also expected to be more proactive, pulling information from a variety of sources and adjusting behavior and recommendations to match. For example, smart appliances that learn user schedules and coalesce that data with time-of-use utility pricing, to perform their functions at the lowest possible cost within the confines of the user’s day-to-day activity. This is the heart of machine learning — processing information from numerous sources, then using the data to improve the execution of a task.

Meeting the Future
As we consider the progression in these fields and envision what is to come, how do technologies like Audio Weaver, TalkTo, and Air help product makers move forward?

The functionality of Audio Weaver helps product makers approach the future by helping product makers rapidly innovate and mitigate risk. Designs are created by placing the signal processing building blocks known as modules on a virtual canvas, connecting them with virtual wires, and adjusting module properties to tune the design. Designs can then be auditioned from within AWE Designer using the PC’s sound card. Multiple team members can each approach the creation and tuning of different portions of the design concurrently, developing features in parallel and later combining them into a final design. With this collaboration and the ability to quickly and seamlessly test iterations and new designs, the entire process is streamlined.

With the ability to include IP developed by third parties, Audio Weaver also allows for the integration of emerging technologies in the form of additional, customized modules. Dozens of third-party algorithms are included, such as immersive 3D audio rendering and active noise cancellation solutions, providing developers sophisticated and specialized systems to build with.

Similarly, the customizability of the TalkTo audio front end offers the performance and flexibility required to meet future voice UI use cases. TalkTo can scale to meet the demands of various products, from single-microphone designs with noise reduction, to designs utilizing 8-microphone arrays with acoustic echo cancellation, beamforming, adaptive interference cancellation, and more.

Finally, Air has linguistic flexibility for deploying a single product version to a wide geographic market, freeing up development overhead. This solution also provides some futureproofing by nature of its small operating footprint, which allows it to be embedded on smaller and low-power devices. This efficient resource usage also enables Air to coexist with other, more resource-intensive technology such as on-device machine learning models. Additionally,’s proprietary acoustic-only approach and slot model architecture provides Air the ability to accurately understand voice commands with variable phrasing, providing end users flexibility and ease of use of their voice-enabled devices for the most natural user experience.

The flexibility and power of the technologies available from DSP Concepts and match the trajectory of incoming advancements in AI and can be adopted by developers wishing to hug the curve and capture a larger portion of the growing voice market.

Visit DSP Concepts and at CES 2022 for a demonstration of our reference design featuring TalkTo and Air.

For more information about TalkTo and the Audio Weaver platform, please visit or contact To learn more about Air, please visit or email