DeepInfra, a purpose-built cloud platform for AI inference, has announced a successful $107 million Series B funding round. Co-led by 500 Global and early Google engineer Georges Harik, the investment will scale its specialized cloud and global capacity. The funding arrives as the AI industry shifts focus toward the computational demands of inference for enterprise and agent-driven applications.
Addressing the AI Inference Bottleneck
The AI market is experiencing a significant inflection point as inference workloads begin to dominate computational needs. This shift is driven by open-source models reaching parity with proprietary systems and the rise of agent-based AI requiring continuous operation. Consequently, inference has become a critical system constraint that many general-purpose cloud platforms are not designed to handle efficiently.
According to Nikola Borisov, co-founder and CEO of DeepInfra, this trend unlocks a new wave of innovation at a fraction of the cost. He emphasized that inference is no longer a thin layer but the defining factor for the majority of AI workloads. The company was built from the ground up to deliver superior economics, performance, and security for this new, always-on distributed model.
A Purpose-Built Infrastructure for the Agentic Era
DeepInfra differentiates itself with a vertically integrated approach, owning and operating its GPU hardware across eight U.S. data centers. This full-stack control from chips to APIs enables structurally lower costs and more predictable latency than hyperscalers that rely on rented capacity. The company's architecture is specifically engineered for the sustained demands of high-throughput inference rather than general-purpose cloud tasks.
The platform is optimized for the emerging agentic era, where AI systems can make over 100 model calls per task. These continuous, high-volume token demands are the baseline workload DeepInfra is designed to support, ensuring predictable performance and cost. The company reports that nearly 30% of its weekly token volume already originates from agentic systems like OpenClaw.
A key collaboration with NVIDIA further enhances the platform's performance and capabilities. DeepInfra is an early infrastructure partner in NVIDIA’s open AI ecosystem, supporting Nemotron models and the NemoClaw agent framework. By deploying advanced Blackwell and Vera Rubin GPUs with NVIDIA Dynamo software, the company is unlocking significant improvements in inference cost efficiency.
Investor Confidence and Market Validation
The investment reflects strong confidence in DeepInfra's specialized strategy. Tony Wang, Managing Partner at 500 Global, noted that purpose-built inference infrastructure will be fundamental to the next phase of AI development. He highlighted the DeepInfra team's proven ability to build and operate distributed systems at a global scale, a crucial factor for their backing.
The company's rapid growth serves as powerful market validation, with revenue tripling since the beginning of 2026. Processing nearly five trillion tokens per week, DeepInfra supports over 190 open-source models through its enterprise-ready platform. Jesse Proudman of Venice AI praised the platform for providing access to best-in-class models with essential reliability and speed.
This $107 million capital injection will accelerate DeepInfra’s mission to meet the escalating demands of production-grade AI. The funds are earmarked for expanding global compute capacity, enhancing developer tooling, and supporting the next generation of advanced models. As enterprise AI deployment matures, DeepInfra is strategically positioned to provide the critical infrastructure needed for the agentic era.

