
A frontier AI lab attacking the compute crisis from the kernel up. We design the inference fabric, run our own provider fleet, build the IDE we use to build everything else, and ship the apps that turn all of it into revenue.
Six layers between an electron and a product action.
We build at every level — not because we have to, but because the only way to ship frontier AI on a non-frontier budget is to optimize end-to-end.
- L0
Silicon
Electrons. Where physics ends and software begins.
NVIDIA H100/H200Cerebras WSEGroq LPUApple SiliconNebius - L1
Kernels
CUDA, Metal, WebGPU, llama.cpp. Where math meets the chip.
Flash attentionQ8 KV cacheGPU layer offloadMetal-tuned llama.cpp - L2
Models
Frontier weights from across the field — open and proprietary, normalized to one shape.
DeepSeek V3.2 / V4 ProQwen3 235B / 235B ThinkingGLM 4.7 / 5.1Kimi K2.5Sonnet-shape aliasFalcon 3 (on-device) - L3
Gateway
Dragon. Adaptive routing across the fleet. The control plane every product runs on.
Adaptive load balancerCluster modePer-tenant budgetsGuardrails - L4
Agents
Planning loops, tool use, memory. The reasoning layer that turns inference into action.
Vellora PlannerBDR AgentsWorkspace AgentsVoice Realtime - L5
Products
What users touch. Real revenue. The proof that the stack below isn't theory.
VelloraMithrandir IDETokenMaxBublrOptamax
Six layers, one loop.
Every Vellora product collapses into the same agent loop running on the same substrate. Voice, code, pipeline state, mesh, energy — different surfaces, identical core. This is what compute looks like when it's ours end‑to‑end.
Inference at the speed of physics.
Dragon is our adaptive inference fabric. It load-balances across the entire frontier model fleet, picks the optimal provider per request, and sustains 500+ tokens/sec at up to 80% lower cost than direct frontier-model calls. Every Vellora app runs on it.
Tokens/sec
Models routed
Cheaper inference
$ curl -sS https://api.mws.run/v1/chat/completions
-H "Authorization: Bearer $MWS_API_KEY"
-H "Content-Type: application/json"
-d {"model": "claude-sonnet-4-6", "messages": [{"role": "user", "content": "ping"}], ...}
# Dragon picks the optimal provider ↓
{
"model": "deepseek-ai/DeepSeek-V3.2-fast",
"choices": [{ "message": { "content": "ok" } }],
"usage": { "prompt_tokens": 1997, "cached_tokens": 1996 }
}Specific. Measurable. Already shipping.
Public repos
Open core, IDE, gateway, mesh messaging, on-device LLMs.
Provider integrations
Cerebras · Fireworks · Nebius · Vertex · Groq · NVIDIA · Anthropic-shape · Apple Silicon · OpenAI.
Stack layers
Silicon → Kernels → Models → Gateway → Agents → Products.
Tokens/sec
Sustained throughput on Dragon — frontier-class speed at a fraction of the cost.
Prompt cache hit rate
Repeated agentic system prompts on Nebius DeepSeek-V3.2-Fast — near-free after the first call.
Median planner latency
Tool-using agent loop, end-to-end, on Sonnet-class models.
From one substrate
Web, voice, mobile (TokenMax). Same agentic core, three runtimes.
Multi-provider by design
Routing changes are an env var. Models are commodities; the substrate is the moat.
Built on the stack.
Mithrandir builds them. Dragon runs them. Some are ours outright; some are independent companies we backed. All of them ship revenue today.
Dragon
Inference platform
Adaptive load-balanced inference across the frontier model fleet. 500+ tokens/sec sustained, up to 80% cheaper than direct frontier calls, 1000+ models behind one endpoint.
Meet DragonMithrandir IDE
AI-native dev platform
Built on the VS Code foundation, fused with Dragon. The IDE the lab uses to build everything else — agentic editing, embedded inference, private by default.
See MithrandirVellora
Revenue Operating System
Autonomous revenue OS for modern sales teams — voice, email, meetings, and pipeline state collapse into one agent loop. Independent company, built by Mithrandir, running on Dragon. We invested.
Open VelloraOptamax
Energy Operating System
Grid-scale energy optimization — autonomous bidding, demand response, and battery dispatch for the energy-AI flywheel. Independent company, built by Mithrandir, running on Dragon. We invested.
See OptamaxWhat we ship for free.
Non-profit work. No paywall, no roadmap pressure — software for the long tail because the stack already exists.
TokenMax
On-device frontier LLMs
Private AI chat that never leaves your phone. Falcon 3 1B/3B/7B on Metal-tuned llama.cpp, with bundled separate on-device models for vision, voice, RAG, and tools. Free, forever.
Bublr
Decentralized messaging
Anonymous mesh messaging over Bluetooth + Nostr. Daily-rotating identities, geohash channels, 270+ relay fallback. Communication that works when the network doesn't. Free, forever.
Compute is the bottleneck of the next century. We're here to break it.
Frontier models cost too much, run too slow, and concentrate power in too few hands. The lab attacks the problem at every level — kernel, scheduler, gateway, agent, app.
We build the IDE we use to build everything else. We run the inference fabric every product calls. We open-source what accelerates the field. We ship products that turn it into revenue today.
If you're a founder, an operator, or an engineer who refuses to wait for permission — this is your kind of lab.