Vellora Labs — Frontier AI lab attacking the compute crisis

Vellora Labs·Frontier AI·Electron → Product

A frontier AI lab attacking the compute crisis from the kernel up. We design the inference fabric, run our own provider fleet, build the IDE we use to build everything else, and ship the apps that turn all of it into revenue.

Meet Dragon Tour the Stack

Mithrandir builds.Dragon runs.1000+ models.500+ tokens/sec.

CerebrasQwen3 235B · GLM 4.7·

FireworksDeepSeek V4 Pro · GLM 5.1·

NebiusDeepSeek V3.2 · Kimi K2.5 · Qwen3.5 397B·

Vertex AIEmbeddings · Rerank·

GroqLlama 4 · LPU·

NVIDIAH100 · H200 · GB200·

AnthropicSonnet alias · drop-in·

Apple SiliconM-series · Metal · llama.cpp·

OpenAIEmbeddings · Whisper · Realtime·

CerebrasQwen3 235B · GLM 4.7·

FireworksDeepSeek V4 Pro · GLM 5.1·

NebiusDeepSeek V3.2 · Kimi K2.5 · Qwen3.5 397B·

Vertex AIEmbeddings · Rerank·

GroqLlama 4 · LPU·

NVIDIAH100 · H200 · GB200·

AnthropicSonnet alias · drop-in·

Apple SiliconM-series · Metal · llama.cpp·

OpenAIEmbeddings · Whisper · Realtime·

The Stack

Six layers between an electron and a product action.

We build at every level — not because we have to, but because the only way to ship frontier AI on a non-frontier budget is to optimize end-to-end.

L0
L0
Silicon
Electrons. Where physics ends and software begins.
NVIDIA H100/H200Cerebras WSEGroq LPUApple SiliconNebius
L1
L1
Kernels
CUDA, Metal, WebGPU, llama.cpp. Where math meets the chip.
Flash attentionQ8 KV cacheGPU layer offloadMetal-tuned llama.cpp
L2
L2
Models
Frontier weights from across the field — open and proprietary, normalized to one shape.
DeepSeek V3.2 / V4 ProQwen3 235B / 235B ThinkingGLM 4.7 / 5.1Kimi K2.5Sonnet-shape aliasFalcon 3 (on-device)
L3
L3
Gateway
Dragon. Adaptive routing across the fleet. The control plane every product runs on.
Adaptive load balancerCluster modePer-tenant budgetsGuardrails
L4
L4
Agents
Planning loops, tool use, memory. The reasoning layer that turns inference into action.
Vellora PlannerBDR AgentsWorkspace AgentsVoice Realtime
L5
L5
Products
What users touch. Real revenue. The proof that the stack below isn't theory.
VelloraMithrandir IDETokenMaxBublrOptamax

The agentic core

Six layers, one loop.

Every Vellora product collapses into the same agent loop running on the same substrate. Voice, code, pipeline state, mesh, energy — different surfaces, identical core. This is what compute looks like when it's ours end‑to‑end.

Dragon · L3 Gateway

Inference at the speed of physics.

Dragon is our adaptive inference fabric. It load-balances across the entire frontier model fleet, picks the optimal provider per request, and sustains 500+ tokens/sec at up to 80% lower cost than direct frontier-model calls. Every Vellora app runs on it.

Tokens/sec

Models routed

Cheaper inference

Visit mws.run

dragon · live wire● ok

$ curl -sS https://api.mws.run/v1/chat/completions
    -H "Authorization: Bearer $MWS_API_KEY"
    -H "Content-Type: application/json"
    -d {"model": "claude-sonnet-4-6", "messages": [{"role": "user", "content": "ping"}], ...}

# Dragon picks the optimal provider ↓

{
  "model": "deepseek-ai/DeepSeek-V3.2-fast",
  "choices": [{ "message": { "content": "ok" } }],
  "usage": { "prompt_tokens": 1997, "cached_tokens": 1996 }
}

99.9% cache hit

1.5s median

● live now

By the numbers

Specific. Measurable. Already shipping.

Public repos

Open core, IDE, gateway, mesh messaging, on-device LLMs.

Provider integrations

Cerebras · Fireworks · Nebius · Vertex · Groq · NVIDIA · Anthropic-shape · Apple Silicon · OpenAI.

Stack layers

Silicon → Kernels → Models → Gateway → Agents → Products.

Tokens/sec

Sustained throughput on Dragon — frontier-class speed at a fraction of the cost.

0.0%

Prompt cache hit rate

Repeated agentic system prompts on Nebius DeepSeek-V3.2-Fast — near-free after the first call.

<0s

Median planner latency

Tool-using agent loop, end-to-end, on Sonnet-class models.

0 surfaces

From one substrate

Web, voice, mobile (TokenMax). Same agentic core, three runtimes.

0 lock-in

Multi-provider by design

Routing changes are an env var. Models are commodities; the substrate is the moat.

Products

Built on the stack.

Mithrandir builds them. Dragon runs them. Some are ours outright; some are independent companies we backed. All of them ship revenue today.

Dragon

Inference platform

Live

Adaptive load-balanced inference across the frontier model fleet. 500+ tokens/sec sustained, up to 80% cheaper than direct frontier calls, 1000+ models behind one endpoint.

Meet Dragon

Mithrandir IDE

AI-native dev platform

Live

Built on the VS Code foundation, fused with Dragon. The IDE the lab uses to build everything else — agentic editing, embedded inference, private by default.

See Mithrandir

Vellora

Revenue Operating System

PortfolioLive

Autonomous revenue OS for modern sales teams — voice, email, meetings, and pipeline state collapse into one agent loop. Independent company, built by Mithrandir, running on Dragon. We invested.

Open Vellora

Optamax

Energy Operating System

PortfolioLive

Grid-scale energy optimization — autonomous bidding, demand response, and battery dispatch for the energy-AI flywheel. Independent company, built by Mithrandir, running on Dragon. We invested.

See Optamax

Impact

What we ship for free.

Non-profit work. No paywall, no roadmap pressure — software for the long tail because the stack already exists.

TokenMax

On-device frontier LLMs

Beta

Private AI chat that never leaves your phone. Falcon 3 1B/3B/7B on Metal-tuned llama.cpp, with bundled separate on-device models for vision, voice, RAG, and tools. Free, forever.

Bublr

Decentralized messaging

Beta

Anonymous mesh messaging over Bluetooth + Nostr. Daily-rotating identities, geohash channels, 270+ relay fallback. Communication that works when the network doesn't. Free, forever.

The thesis

Compute is the bottleneck of the next century. We're here to break it.

Frontier models cost too much, run too slow, and concentrate power in too few hands. The lab attacks the problem at every level — kernel, scheduler, gateway, agent, app.

We build the IDE we use to build everything else. We run the inference fabric every product calls. We open-source what accelerates the field. We ship products that turn it into revenue today.

If you're a founder, an operator, or an engineer who refuses to wait for permission — this is your kind of lab.

Try Vellora hello@vellora.ai