We Outperformed the World’s Top AI Labs - Then Opened It to Everyone

Noise Cancellation

When our team at Sanas set out to challenge the winners of Microsoft’s Deep Noise Suppression Challenge at ICASSP 2023, it was immediately clear to me that building a useful product has two components to it: Performing in the sense of achieving the best scores and metrics, but also focusing on our customers' real-world problems. The latter part is harder to express in numbers, and requires experience that cannot be acquired in a ML lecture. Here at Sanas, we combine technical excellence with a passion to solve our customers' problems, which is what allows us to succeed in both dimensions.

I’ve spent most of my professional life studying sound, not just how we perceive it, but how we live with it on a daily basis. I’ve seen how background noise can chip away at our ability to connect, to be heard, and to feel understood. That’s why I’ve dedicated a big part of my career to building technology that doesn’t just remove background noise, it also restores clarity and intelligibility. And now, I’m proud to be part of a brand that made that technology free for everyone.

Winning awards validates scientific efforts. But our greatest victory is something quieter: making advanced noise cancellation technology accessible to anyone, anywhere. Because silence, clarity, and understanding, shouldn’t be a luxury.

The science that leads the way

At Sanas, we aren’t just solving an engineering problem. We’re addressing a human one. The world is noisy, not just literally, but emotionally. That noise builds up and disconnects us from each other. And too often, the tools that bring peace and clarity are locked behind paywalls.

Our background noise cancellation system doesn’t just suppress static or hums, it listens the way people listen. It understands what’s important in a conversation: Latency, intelligibility and clarity. The core design focus is clear.

Preserving the speaker’s voice
Minimizing system delay
Suppressing all non-speech artifacts

At Sanas, innovation means more than just some fancy AI model. It means delivering peace. It means restoring clarity in communication. Traditional approaches for Single-Channel Speech Enhancement (SCSE), typically diminish one quality to enhance another, if latency is reduced, then intelligibility follows. When I approached the challenge of noise cancellation, the key driving principle was retaining all factors that make up speech, while keeping latency imperceptible.

We achieved this by combining several methodologies, focused into three simple blocks for a real-time, low-latency pipeline. See Figure 1.

Forward Projection

This module takes a short window of noisy speech and projects it into a high-dimensional latent space. The transformation helps to separate the statistical characteristics of speech and background noise, making it easier to isolate the signal of interest.

Mask Estimator

Inspired by techniques like Wiener filtering, this module predicts a gain mask that suppresses noise in the latent space while preserving speech components. The mask is multiplicative and adapts frame by frame to the incoming signal.

Inverse Projection

The final module reconstructs the cleaned audio by reversing the latent projection. The output is a denoised segment of speech, which is then stitched together with prior segments to form a continuous, intelligible audio stream.

The system runs at an overall latency of 15ms, with support for both 8kHz and 16kHz audio. That makes it fast enough for live communication, including voice calls, online meetings, and real-time streaming, without perceptible lag.

To validate our system, we turned to the most widely recognized benchmark in the field: the Microsoft Deep Noise Suppression Challenge, hosted at ICASSP 2023.

The challenge evaluates noise suppression models using two open-source metrics:

Perceptual Quality (DNSMOS): a deep-learning-based model trained to approximate human judgments of speech quality.
Word Accuracy (WER): a measure of how well an automatic speech recognition (ASR) model can transcribe speech after noise suppression.

We ran our model through the same testing pipeline used in the challenge. We used the same data, same evaluation metrics, same judge model. The results were clear: Sanas outperformed every published entry in the 2023 competition across both intelligibility and perceptual quality. This wasn’t just a benchmark win, it was confirmation that our approach works at the highest level, both scientifically and practically. In Table 1, the results of our testing can be seen alongside the official challengers.

This table shows DNSMOS evaluation results in three dimensions, as well as the confidence level of those results. To simplify:

SIG: Signal quality, How natural and undistorted the speech sounds.
BAK: Background noise quality, How well background noise is suppressed.
OVRL: Overall quality, A composite rating of the overall audio experience.

Sanas achieved the highest scores across all three categories, outperforming every official submission from the 2023 challenge, including both Microsoft baselines. These results demonstrate the superior perceptual quality of Sanas’ model in real-world scenarios.
These results vindicate my scientific work, however, it is the hundreds-of-thousands of happy users of the technology we developed that truly drive my creative passion.

Why We Made It Free

After running our internal evaluations and outperforming the top models from the Microsoft DNS Challenge, we faced a familiar decision: commercialize the technology, license it to vendors, turn it into a business line. We chose a different route, not because we couldn’t monetize it, but because we believed the field needed a reset.

For years, high-performance noise suppression has been treated as a premium feature, gated behind subscriptions or bundled into complex enterprise tools. Meanwhile, the problem it solves is anything but niche. It's global, persistent, and growing. Students in shared apartments. Agents in busy call centers. Doctors in cross-border consults. These are not edge cases. They’re the majority. We saw an opportunity to shift the industry, not with marketing, but with science.

So we released the technology, fully open and free to use. No trials. No constraints. Just real-time, high-quality noise removal, available to anyone who needs it.

This isn’t the endpoint. It’s the foundation. What we’ve published is the beginning of a broader body of work, one focused on understanding human speech under adverse conditions and improving the quality of communication at scale. We’re not simply solving for noise. We’re building the next generation of speech infrastructure.

Conclusion: Redefining the Standard for Noise Cancellation

The benchmarks confirm what we already knew internally, our model sets a new standard. Against the most competitive entries in the Microsoft DNS Challenge, Sanas consistently delivered the highest perceptual quality across all dimensions.

But strong metrics alone aren’t enough. The real differentiator is what we chose to do with that performance. We made it publicly available, free of charge, because we saw a technical solution that deserved real-world adoption, not gatekeeping.

This release isn’t a one-off. It’s the foundation of a larger roadmap. There is still significant work ahead in real-time speech processing, from dynamic enhancement systems to broader integration with speech understanding and transformation. We plan to lead that work.

At Sanas, we approach this field with the same mindset we apply to the technology itself: remove the noise, keep what matters, and make it operate in real time.

More to come.