OpenAI Launches Three New Realtime Audio Models for Developers
  • News
  • North America

OpenAI Launches Three New Realtime Audio Models for Developers

The new models enable more natural voice, translation, and transcription capabilities in applications.

5/9/2026
Ali Abounasr El Alaoui
Back to News

OpenAI has announced a significant expansion of its audio AI capabilities with the launch of three new models available through its API. These tools, GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper, are designed to empower developers to create more natural and intelligent voice-driven applications. The release marks a strategic move from simple command-based audio interactions toward sophisticated voice interfaces that can reason, translate, and transcribe in real time.


A New Generation of Voice Intelligence

Leading the new suite is GPT-Realtime-2, the company's first voice model equipped with GPT-5-class reasoning. This model is engineered to manage complex user requests and maintain fluid, natural conversations without interruption. It introduces advanced features such as parallel tool calls, improved error recovery, and an expanded 128K context window to support more intricate workflows.

The enhancements deliver quantifiable performance gains over its predecessor, GPT-Realtime-1.5. On the Big Bench Audio evaluation, GPT-Realtime-2 scored 15.2% higher for audio intelligence, demonstrating stronger reasoning and context management. These improvements enable the model to handle specialized terminology and adjust its tone, making interactions more reliable and appropriate for production environments.

Bridging Global Communication Gaps

To address multilingual needs, OpenAI introduced GPT-Realtime-Translate, a live translation model that keeps pace with a speaker’s natural cadence. It supports over 70 input languages and provides translation into 13 output languages, facilitating seamless cross-language communication. This technology is poised to enhance user experiences in global customer support, international sales, and media.

Complementing the new offerings is GPT-Realtime-Whisper, a streaming speech-to-text model built for low-latency transcription. It transcribes audio as a person speaks, enabling responsive applications like live captioning for events and real-time note generation during meetings. This functionality allows businesses to integrate spoken interactions directly into their workflows as they happen.

Real-World Applications and Industry Adoption

Early adoption partners are already leveraging these models to build next-generation voice experiences. Zillow is developing an assistant to help users find homes, reporting a 26-point increase in call success rates on its most difficult benchmarks. Similarly, Priceline is using the technology to create an AI travel agent that can manage entire trips through conversational voice commands.

The models are proving effective across various sectors, from telecommunications to customer service. Deutsche Telekom is testing the translation model to make multilingual interactions feel more natural for its customers. Meanwhile, Intercom noted the model's ability to reason through complex queries and handle domain-specific details felt like a step change for real customer conversations.

Safeguards and Commercial Availability

OpenAI has integrated multiple safeguards into the Realtime API to prevent misuse, including active classifiers that can halt conversations violating harmful content guidelines. The company’s usage policies prohibit deception and require developers to clearly disclose when end-users are interacting with an AI. The platform also supports EU Data Residency and is covered by enterprise privacy commitments.

All three models are now available for developers through the Realtime API. GPT-Realtime-2 is priced at $32 per million input audio tokens and $64 per million output tokens. GPT-Realtime-Translate and GPT-Realtime-Whisper are priced per minute, at $0.034 and $0.017 respectively, making the advanced capabilities accessible for a range of applications.


This launch represents a pivotal advancement in making voice a more powerful and intuitive interface between humans and software. By providing tools that can listen, reason, and act with greater sophistication, OpenAI is enabling developers to build a new class of voice applications. The real-world impact, demonstrated by early partners, suggests a significant shift toward more dynamic and helpful conversational AI across industries.