
Inferact, an AI infrastructure startup founded by the core contributors behind the open-source inference project vLLM, has raised $150M in seed funding at an $800M valuation. The company is focused on making large-scale AI inference faster, cheaper, and easier to deploy.
The round was led by Andreessen Horowitz and Lightspeed, with participation from Databricks’ venture arm, the UC Berkeley Chancellor’s Fund, and additional investors.
Built by the team behind vLLM
Inferact was founded by Simon Mo, Woosuk Kwon, Kaichao You, and Roger Wang, the researchers and engineers behind vLLM, an open-source library widely used for large language model inference.
vLLM, short for virtual large language model, is maintained by a global open-source community and is already deployed by major technology companies including Meta and Google. The project focuses on optimising inference, the stage where trained AI models generate outputs in production environments.
Solving the inference bottleneck
As AI models grow larger and are used by more applications simultaneously, inference has become a major constraint. Running models for longer sessions, processing more tokens, and serving thousands of concurrent users places heavy demands on memory efficiency and hardware utilisation.
vLLM addresses these challenges by optimising how models manage memory and compute. Its PagedAttention technique reduces memory waste by handling key-value caches more efficiently, while additional methods such as quantisation and parallel token generation help reduce latency and infrastructure costs.
From open source to commercial infrastructure
Inferact has outlined two parallel goals. The first is to continue supporting and scaling the vLLM open-source project by funding development and expanding support for new model architectures, hardware platforms, and large multi-node deployments.
The second is to build a commercial inference platform on top of this foundation. The company plans to develop what it describes as a universal inference layer, offering production-grade features such as serverless deployment, observability, troubleshooting, and disaster recovery, likely delivered via Kubernetes-based infrastructure.
Rather than competing directly with cloud providers or model hosts, Inferact says it intends to work alongside existing platforms to simplify AI serving for engineering teams.
Making AI serving simpler at scale
Inferact’s long-term ambition is to remove the need for large, specialised infrastructure teams to deploy and operate AI models at scale. By abstracting away complexity in inference, the company aims to make production AI systems easier to run as models continue to grow in size and capability.
With a large seed round and deep roots in one of the most widely adopted inference projects, Inferact is positioning itself as a core infrastructure player in the next phase of AI deployment.