Stability AI Unveils Stable Audio 3.0 Open-Weights Models

Stability AI has unveiled Stable Audio 3.0, a new family of generative audio models designed to advance music creation. This release introduces significant improvements, including the ability to generate full-length musical tracks up to six minutes long. The models are built entirely on licensed data, addressing critical copyright concerns within the rapidly evolving AI music landscape.

A New Family of Audio Models

The Stable Audio 3.0 suite includes four distinct models tailored for various applications. The Small and Small SFX versions are optimized for on-device generation, while the Medium and Large models offer higher fidelity and complexity. These models range from 459 million to 2.7 billion parameters, providing a spectrum of capabilities for different creative needs.

Advancements in Generation and Accessibility

A key breakthrough is the extended generation length, with the Medium and Large models capable of producing compositions over six minutes long. This represents a substantial increase from the 47-second limit of previous open versions, allowing for more structured and coherent musical pieces. The new architecture ensures that melodic themes are maintained throughout the longer tracks, enhancing their overall quality.

In a move to foster community innovation, Stability AI has released the Small and Medium models with open weights on Hugging Face. Users can commercialize their creations under a Community License, while organizations with over $1 million in revenue require an Enterprise License. The most advanced Large model remains accessible exclusively through the company's API and for enterprise self-hosting.

Technical Innovations and Customization

The new models are powered by a novel semantic-acoustic autoencoder, enabling more flexible and precise audio generation. This architecture supports features like audio inpainting, which allows creators to edit or extend specific segments of a track without starting over. It also introduces variable-length generation, giving users per-second control over the output duration for greater creative freedom.

For the first time, Stability AI is providing documentation and support for LoRa training with its audio models. This efficient fine-tuning method, popular in image generation, allows users to customize the models using their own sound libraries. This feature empowers artists and developers to create unique sonic palettes and styles tailored to their specific projects.

Navigating the Competitive and Legal Landscape

This launch positions Stability AI within a competitive field that includes major players like Google and startups such as Suno and Udio. By training its models exclusively on licensed data through partnerships with Universal Music Group and Warner Music Group, the company proactively addresses legal challenges. This commitment to ethical data sourcing is a key differentiator in an industry facing scrutiny over copyright.

To bolster its professional music offerings, Stability AI has hired industry veteran Ethan Kaplan, formerly of Fender and Universal Audio. This strategic move mirrors a trend across the AI music sector, with competitors also recruiting experienced music executives. Such hires signal a deepening integration between technology firms and the established music industry to build artist-focused tools.

The release of Stable Audio 3.0 marks a significant milestone in the evolution of generative audio technology. By combining advanced capabilities, open accessibility, and a foundation of ethically licensed data, Stability AI is setting a new standard. This initiative not only provides powerful new tools for creators but also charts a more sustainable and collaborative path forward for AI in music.

Stability AI Unveils Stable Audio 3.0 With Open-Weights Models

A New Family of Audio Models

Advancements in Generation and Accessibility

Technical Innovations and Customization

Navigating the Competitive and Legal Landscape