Amazon has introduced Nova Sonic, a powerful new generative AI voice model that brings human-like conversation capabilities to its ecosystem. Designed to rival voice models from OpenAI and Google, Nova Sonic promises faster response times, better speech recognition, and more natural-sounding interactions.
Built for natural conversations
Nova Sonic is Amazon’s direct response to the evolution of AI-powered assistants. Unlike older Alexa models, which often felt robotic, Nova Sonic can process voice natively and deliver smoother, human-like replies. It takes cues from the user’s pauses and interruptions, making dialogues feel more fluid.
According to Amazon, Nova Sonic achieves 1.09 seconds of response latency, beating OpenAI’s GPT-4o’s 1.18 seconds, and can understand users even in noisy environments or when they mumble.
Now available through Bedrock
The model is being made available to developers via Amazon Bedrock, the company’s platform for building enterprise AI applications. Nova Sonic uses a bi-directional streaming API, enabling real-time, back-and-forth communication between users and apps.
Amazon also touts Nova Sonic as the most cost-efficient voice AI model, offering up to 80% cost savings compared to OpenAI's GPT-4o.
Powering the next-gen Alexa+
Parts of Nova Sonic are already powering Alexa+, Amazon’s upgraded digital assistant. Rohit Prasad, SVP and Head Scientist of AGI at Amazon, said that the model builds on years of experience with orchestration systems that Alexa uses to route commands, fetch real-time data, and take actions across apps.
He explained that Nova Sonic excels at interpreting user intent and selecting the right tool for the task — whether it’s fetching online info, accessing proprietary databases, or operating third-party services.
Superior accuracy across languages
In benchmark tests, Nova Sonic achieved a word error rate (WER) of just 4.2% across English, French, German, Spanish, and Italian on the Multilingual LibriSpeech dataset. On another test for multi-speaker, noisy environments, Nova Sonic outperformed GPT-4o by 46.7% in accuracy, according to Amazon.
Part of Amazon’s Vision for AGI
Nova Sonic is a key part of Amazon’s broader plan to develop Artificial General Intelligence (AGI)—AI systems capable of doing anything a human can do on a computer. Prasad revealed that Amazon’s future roadmap includes models that can interpret and generate across various modalities like image, video, voice, and sensory data.