Meet Sanas Accent Translation 4.5: Ultra-Fidelity, Increased Intelligibility, Clearer Speech

At Sanas, we design speech AI around a simple principle: technology should adapt to humans, not the other way around.
That principle drove the development of Accent Translation 4.0, the first model to demonstrate that real-time accent translation could make English speech more intelligible than the original audio on Automatic Speech Recognition (ASR) systems. Building on that success, we expanded our capabilities to support a wider global audience with new input accents from Africa & the Middle East and a new British output accent, bringing even more voices into the conversation.
But we didn’t stop at global expansion, we kept raising the bar on quality. We then introduced Speech Enhancement 1.0, with 24 kHz Ultra-Fidelity audio capturing the warmth, texture, and presence that transcends standard telephony.
Then our customers asked a critical question: "Can we have both Accent Translation and Ultra-Fidelity audio together?"
With Accent Translation 4.5, the answer is yes. Built on our 24 kHz Ultra-Fidelity architecture, Sanas Accent Translation 4.5 delivers accent translation with unprecedented clarity, naturalness, and acoustic detail — bringing the benefits of Speech Enhancement and Accent Translation into a single system.
In this article, you’ll hear side-by-side audio comparisons, see how Accent Translation 4.5 improves naturalness and intelligibility over Accent Translation 4.0, and learn about the technical breakthroughs that make 24 kHz accent translation possible in real time.
What’s New in Accent Translation 4.5?
With Accent Translation 4.5, we’re introducing a set of upgrades that dramatically elevate speech quality across fidelity, naturalness, and intelligibility. Here's what's new:
- 24 kHz Ultra-Fidelity Audio: A major leap in audio resolution that delivers richer, fuller voice quality. This upgrade captures high-frequency harmonics like crisp fricatives and subtle breath sounds that standard 8 kHz low-fidelity and even 16 kHz high-fidelity audio can't reproduce.
- Enhanced Naturalness and Stability: Refined speech synthesis techniques make voices sound even more natural, less synthetic, and more stable across long utterances and complex phrases.
- Superior Intelligibility for Challenging Inputs: Targeted model enhancements and expanded training data deliver significant intelligibility gains, especially for Latin American, African, and Middle Eastern input accents.
Why 24 kHz Matters: Beyond Wideband
TLDR: While most communication tools cap audio quality at 8 kHz or 16 kHz, Sanas' latest AT model upgrades it to 24 kHz Ultra-Fidelity, capturing the full richness of the human voice.
For decades, contact centers and enterprise communication platforms have been constrained by narrowband (8 kHz) and wideband (16 kHz) audio limits. While functional, these limits cut off the upper frequencies of the human voice responsible for clarity, articulation, and vocal nuance. These bandwidth ceilings blur consonants, flatten the crisp "s" sounds, and remove the breath of a laugh and other subtle cues that make speech sound natural and easy to understand.
Sanas Accent Translation 4.5 shatters this ceiling. By outputting speech at 24 kHz, we preserve the high-frequency spectrum that gives a voice its full presence, texture, and expressiveness. The model takes standard low fidelity (8-16 kHz) input and intelligently reconstructs the missing upper band frequencies to produce a pristine 24 kHz audio — quality that become immediately audible in the side-by-side examples included in this article.
How do we restore sound that isn't there? It works by analyzing the harmonic structure of human speech. Since the "missing" high frequencies share a predictable relationship with the lower tones we do have, our algorithm uses the input as a blueprint to mathematically predict and regenerate the upper spectrum — effectively filling in the details that standard compression wiped away.
Key Benefits of 24 kHz Ultra-Fidelity
- Crystal Clear Fricatives: Sounds like f, s, sh, and th are often lost in lower bandwidths, leading to confusion. 24 kHz renders these with precision.
- Reduced Listening Fatigue: Higher-resolution audio is easier for the brain to process, improving comfort and comprehension during long conversations.
- Future-Proofing Your Voice Stack: As communication platforms move toward HD audio, your accent translation pipeline is already optimized for next-gen audio.
Ready to hear the difference for yourself? Listen to the differences in sampling rates between the 8 kHz, 16 kHz, and 24 kHz examples included below.
Original | AT (8 kHz) | AT (16 kHz) | AT (24 kHz) |
The Best Just Got Better: Accent Translation 4.5 vs. 4.0
When we launched Accent Translation 4.0 three months ago, we redefined what real-time accent translation could achieve, proving that translated English speech could be more intelligible than the original audio. Accent Translation 4.5 builds on this foundation with meaningful upgrades in naturalness, intelligibility, and acoustic richness that you can both hear and measure.
1. More Natural, Full-Spectrum Audio Quality
TLDR: While Accent Translation 4.0 was smooth, Accent Translation 4.5 brings out full, natural human clarity.
Accent Translation 4.5 produces speech with greater fullness, harmonic detail, and vocal presence. In blind A/B tests conducted with independent United States listeners, Accent Translation 4.5 was preferred over 4.0 in 63.18% of trials for naturalness, driven by improvements in "fullness" and "richness" in the voice.

By capturing the full harmonic structure of speech, we ensure that the translated accent doesn't just sound more articulate, it sounds lifelike, expressive, and true to the original voice.
Hear how the upgrades enhance depth and detail across male and female speakers with a range of input accents.
Original | Accent Translation 4.0 | Accent Translation 4.5 |
2. Higher Intelligibility Across More Accents
TLDR: Smarter models and more data mean better understanding.
Accent Translation 4.0 set a high bar by reducing the Word Error Rate (WER) across multiple accents in English. Accent Translation 4.5 pushes this even further with architectural improvements and a significantly expanded, more diverse training dataset.
Improving intelligibility isn't just about audio resolution; it’s about the intelligence behind the model. Using a state-of-the-art ASR model on a comprehensive evaluation dataset, Accent Translation 4.5 demonstrates a 16.6% overall relative WER over Accent Translation 4.0.
The largest gains appear in regions where speech patterns include tonal, rapid, or phonetically complex elements: Middle East (ME) 29.4% and African (AFR) 23.3% relative improvement.

Lowering WER isn’t just a technical benchmark, it addresses a long-standing issue in speech technology. Historically, Automatic Speech Recognition (ASR) systems have shown measurable bias against non-native and regional accents in English, leading to higher error rates for millions of speakers across CX, AI agents, transcription, healthcare, and more. By improving intelligibility for accented speakers — especially those historically underserved by ASR systems — Accent Translation 4.5 helps reduce these inequities and strengthens every system that depends on accurate speech understanding.
These gains represent more than an acoustic upgrade — they reinforce Sanas’s mission to improve communication any time a human is in the loop and to make the speech ecosystem more inclusive for global speakers.
The side-by-side transcriptions below show how Accent Translation 4.5 resolves clipped consonants, distorted vowels, and ambiguous phrasing that Accent Translation 4.0 occasionally misinterpreted.
Original | Accent Translation 4.0 | Accent Translation 4.5 |
total okay yes that's right | total archy yes that's right | total okay yes that's right |
good afternoon miss the purpose of the call it's about a snow remove service requested | good afternoon miss the purpose of the call it's bowanow remove service requested | good afternoon miss the purpose of the call it's about a snow remove service requested |
oh donna you have no loans at the moment donna | oh donna you have no launch at the moment donut | oh donna you have no loans at the moment donna |
A New Standard for Connection
Accent Translation 4.5 is more than a model upgrade, it represents a new chapter in what speech technology can deliver. By uniting real-time accent translation with 24 kHz Ultra-Fidelity audio, we bring unprecedented clarity, expressive detail, and stability to global communication.
These improvements matter everywhere clarity matters:
- in customer experience, where understanding drives trust
- in enterprise collaboration across global teams
- in AI and agentic systems that depend on accurate speech inputs
- and in any human-in-the-loop workflow where miscommunication has real consequences.
Most importantly, it supports Sanas’ mission to make speech technology more inclusive. By improving intelligibility for accented speakers historically underserved by ASR systems, it reduces inequities and expands access, ensuring every voice is not only heard, but fully understood.
And when clarity becomes the default, everything else scales: trust, comprehension, efficiency, and outcomes. Accent Translation 4.5 raises that standard across accents, environments, and distances.
Ready to hear what Ultra-Fidelity accent translation sounds like in your workflows? Request a personalized demo of Accent Translation 4.5 and experience the difference for yourself.








