By Shawn Zhang

November 18, 2025

Meet Sanas Speech Enhancement 1.0: Redefining Speech Clarity, Fidelity, Robustness, and Adaptability

Background noise competing with speech is a fact of life — in cafés, open offices, cars, and customer calls. With the introduction of Speech Enhancement 1.0, Sanas sets a new benchmark on clear communication everywhere, for everyone. Speech Enhancement combines a new proprietary AI model trained on massive real-world datasets to deliver speech that preserves the fullness of emotion, tone, and identity even in the most chaotic environments.

Read on to explore how Speech Enhancement 1.0 redefines the standard for real-time communication clarity.

Key Features and Benefits in Speech Enhancement 1.0

Speech Enhancement 1.0 introduces major advances across the board in speech fidelity, robustness, and adaptability.

Next-Generation Performance: Powered by a new AI model, Speech Enhancement 1.0 delivers breakthrough gains in both objective metrics and perceptual quality compared with the leading solutions on the market.
Best-in-Class Robustness: Speech Enhancement 1.0 is engineered to perform consistently across unpredictable, multi-speaker, and high-noise environments.
Ultra-Fidelity Audio: Expanding beyond low-fidelity (8 kHz) and high-fidelity (16 kHz) audio, Speech Enhancement now also operates at ultra-fidelity (24 kHz) capturing and preserving the full warmth, texture, and detail of human voices.
Smart Ringtone Passthrough: Speech Enhancement supports configurable ringtone passthrough, keeping contact-center agents and professionals alert for incoming calls.
Designated Modes for Every Scenario
- Standard Mode: Preserves all speakers’ voices, well-suited for group calls or shared microphones.
- Voice Isolation Mode: Isolates and preserves only the foreground speaker while suppressing other voices for maximum clarity.

Ultra-Fidelity Audio

For decades, voice communication has run on a legacy standard: 8 kHz audio, productionized in the 1970s to make phone calls efficient. However, at that fidelity, only a fraction of the voice’s frequency range is captured: enough to understand the words, but not the full warmth behind them. Consonants blur, harmonics vanish, and conversation starts to sound flat.

At Sanas, we believe clarity means more than just hearing the words. It means hearing people’s tone, texture, and intent. Our previous bandwidth extension system reconstructed 16 kHz audio in real time from low-fidelity input, restoring the detail that the original signal never contained.

Now, with Speech Enhancement 1.0, we’ve gone a step further, introducing ultra-fidelity 24 kHz audio. The result is voice that feels real, full, and effortless to follow, even in noisy or fast-paced conversations.

Low Fidelity (8 kHz)

High Fidelity (16 kHz)

Ultra Fidelity (24 kHz)

Smart Ringtone Passthrough Feature

Clarity shouldn’t come at the cost of awareness.

For professionals who need to stay alert — like contact-center agents or remote teams handling live calls — Speech Enhancement 1.0 introduces Smart Ringtone Passthrough, a configurable feature that passes through ringtones so that you can hear an incoming call come in.

It intelligently detects specific alert tones such as ringtones or call notifications and allows them to pass through while keeping everything else silent. You stay focused and responsive without background chaos breaking your concentration.

Original (Input)

Sanas SE1.0

Sanas SE1.0 with Ringtone Passthrough

Standard and Voice Isolation Modes

Imagine you’re on a video call from the middle of a busy office floor with keyboards clicking, teammates chatting, and phones ringing. At first, you’re the only person from your side on the call, so you need every background voice blocked out except your own.

A few minutes later, you wave a colleague over to add their perspective. They step in behind you, and suddenly you want both of your voices included, but not the coworkers talking just a few feet away.

This is exactly the kind of situation Speech Enhancement 1.0 is built to handle.

Voices don’t just differ in loudness, they carry cues about speaker distance and context. Speech Enhancement reads these cues automatically, distinguishing primary and secondary speakers based on acoustic distance rather than arbitrary volume thresholds. The result: the system knows when to elevate additional voices and when to only enhance the primary speaker.

This distinction powers Speech Enhancement’s two operating modes:

Standard Mode: Keeps all voices within close and medium range, enabling a natural multi-speaker experience.
Voice Isolation Mode: Enhances only the nearest voice, suppressing other voices for maximum primary speaker clarity.

Original (Input)

SE1.0 Standard Mode

SE1.0 Voice Isolation Mode

Speech Enhancement’s design recognizes the primary speaker and decides when to include nearby voices. The result is a model that adapts naturally to real-world conversations: in a conference call, it preserves colleagues’ voices; on a private call, it keeps only yours.

Speech Enhancement 1.0 Results and Audio Samples

We compared Speech Enhancement 1.0 against a competitor using internally developed test sets made up entirely of real-life recordings. At Sanas, we believe that synthesized samples or low-noise public datasets cannot accurately represent real-world conditions. That’s why every sample in our evaluation reflects authentic, everyday environment along with a range of edge cases and unpredictable outliers designed to test robustness.

The results speak for themselves. In both standard and voice isolation modes, Speech Enhancement consistently outperforms in both enhancing speech despite background noise and voice isolation, delivering cleaner, more natural speech across a wide range of acoustic conditions. Readers interested in the full list of objective quality metrics (NISQA, DNSMOS, and more) can find detailed definitions and references in "Inside Sanas Speech Enhancement 1.0: The Science Behind Real-Time Voice Clarity."

Standard Mode: Multi-Speaker Clarity

Speech Enhancement 1.0’s Standard Mode preserves nearby voices for natural group conversation.

Original (Input)

Generic

Sanas SE1.0 Standard Mode

Rain noise and speech

Heavy noise leakage, syllable suppression (”s” in “so”), and muffled speech at times

Clean voice

Original (Input)

Generic

Sanas SE1.0 Standard Mode

Background chatter and noise, multiple people in very close proximity to the microphone

Heavy speech deletion and noise leakage

Speakers in close and medium range from the microphone are kept, primary speaker voice is well extracted

Original (Input)

Generic

Sanas SE1.0 Standard Mode

Loud music and speech

Heavy music leakage in silence and during speech

Music fully disentangled from speech

Voice Isolation Mode

When competing human speech is the main interference, Speech Enhancement 1.0’s Voice Isolation Mode focuses solely on the primary speaker, elevating it above background talk without artifacts.

Original (Input)

Generic

Sanas SE1.0 Voice Isolation Mode

Siren and background voice

Background voice and noise leakage before and during speech, static noise

No background voice, clear primary speaker voice

Original (Input)

Generic

Sanas SE1.0 Voice Isolation Mode

Speaker in traffic with background voices

Voice leakage during and before speech

No background voice leakage

Original (Input)

Generic

Sanas SE1.0 Voice Isolation Mode

Deteriorated signal with background voice and noise

Background voice leakage and noise leakage

Clean speech, no background voices

Across both modes, Speech Enhancement 1.0 consistently achieves higher objective scores and superior perceptual quality in listening tests, validating what users hear in practice. These improvements come without added distortion or latency, demonstrating that high fidelity and real-time performance can truly coexist.

Ultimately, Speech Enhancement 1.0 goes beyond suppressing noise. It distinguishes and preserves the nuances of human speech to deliver the clarity, warmth, and realism that make every voice sound natural, no matter the environment.

Interested in learning more about the science behind these modes, including how we developed, tested, and verified their performance? Check out "Inside Sanas Speech Enhancement 1.0: The Science Behind Real-Time Voice Clarity" to explore the data, methodology, and real-world results that power Speech Enhancement 1.0.

Redefining Clarity in Human Communication

With Speech Enhancement 1.0, Sanas moves beyond traditional noise suppression to understand how people actually sound in the world around them. By combining advanced acoustics, AI model design, and real-world simulation, our team has built a system that adapts to any environment without sacrificing the voice's warmth or character.

For enterprises, that means clearer customer interactions, stronger agent confidence, and smoother collaboration across every communication channel. For individuals, it means being heard fully and authentically no matter where you are.

At Sanas, our mission has always been to make communication more inclusive, intelligible, and human. By transforming background chaos into clarity, Speech Enhancement 1.0 brings us closer to a world where every conversation is clear, connected, and understood.