Vellora Labs·Frontier AI·Electron → Product
Vellora

A frontier AI lab attacking the compute crisis from the kernel up. We design the inference fabric, run our own provider fleet, build the IDE we use to build everything else, and ship the apps that turn all of it into revenue.

Mithrandir builds.Dragon runs.1000+ models.500+ tokens/sec.
CerebrasQwen3 235B · GLM 4.7·
FireworksDeepSeek V4 Pro · GLM 5.1·
NebiusDeepSeek V3.2 · Kimi K2.5 · Qwen3.5 397B·
Vertex AIEmbeddings · Rerank·
GroqLlama 4 · LPU·
NVIDIAH100 · H200 · GB200·
AnthropicSonnet alias · drop-in·
Apple SiliconM-series · Metal · llama.cpp·
OpenAIEmbeddings · Whisper · Realtime·
CerebrasQwen3 235B · GLM 4.7·
FireworksDeepSeek V4 Pro · GLM 5.1·
NebiusDeepSeek V3.2 · Kimi K2.5 · Qwen3.5 397B·
Vertex AIEmbeddings · Rerank·
GroqLlama 4 · LPU·
NVIDIAH100 · H200 · GB200·
AnthropicSonnet alias · drop-in·
Apple SiliconM-series · Metal · llama.cpp·
OpenAIEmbeddings · Whisper · Realtime·
The Stack

Six layers between an electron and a product action.

We build at every level — not because we have to, but because the only way to ship frontier AI on a non-frontier budget is to optimize end-to-end.

  1. L0

    Silicon

    Electrons. Where physics ends and software begins.

    NVIDIA H100/H200Cerebras WSEGroq LPUApple SiliconNebius
  2. L1

    Kernels

    CUDA, Metal, WebGPU, llama.cpp. Where math meets the chip.

    Flash attentionQ8 KV cacheGPU layer offloadMetal-tuned llama.cpp
  3. L2

    Models

    Frontier weights from across the field — open and proprietary, normalized to one shape.

    DeepSeek V3.2 / V4 ProQwen3 235B / 235B ThinkingGLM 4.7 / 5.1Kimi K2.5Sonnet-shape aliasFalcon 3 (on-device)
  4. L3

    Gateway

    Dragon. Adaptive routing across the fleet. The control plane every product runs on.

    Adaptive load balancerCluster modePer-tenant budgetsGuardrails
  5. L4

    Agents

    Planning loops, tool use, memory. The reasoning layer that turns inference into action.

    Vellora PlannerBDR AgentsWorkspace AgentsVoice Realtime
  6. L5

    Products

    What users touch. Real revenue. The proof that the stack below isn't theory.

    VelloraMithrandir IDETokenMaxBublrOptamax
The agentic core

Six layers, one loop.

Every Vellora product collapses into the same agent loop running on the same substrate. Voice, code, pipeline state, mesh, energy — different surfaces, identical core. This is what compute looks like when it's ours end‑to‑end.

Dragon · L3 Gateway

Inference at the speed of physics.

Dragon is our adaptive inference fabric. It load-balances across the entire frontier model fleet, picks the optimal provider per request, and sustains 500+ tokens/sec at up to 80% lower cost than direct frontier-model calls. Every Vellora app runs on it.

0+

Tokens/sec

0+

Models routed

0%

Cheaper inference

dragon · live wire● ok
$ curl -sS https://api.mws.run/v1/chat/completions
    -H "Authorization: Bearer $MWS_API_KEY"
    -H "Content-Type: application/json"
    -d {"model": "claude-sonnet-4-6", "messages": [{"role": "user", "content": "ping"}], ...}

# Dragon picks the optimal provider ↓

{
  "model": "deepseek-ai/DeepSeek-V3.2-fast",
  "choices": [{ "message": { "content": "ok" } }],
  "usage": { "prompt_tokens": 1997, "cached_tokens": 1996 }
}
99.9% cache hit
1.5s median
live now
By the numbers

Specific. Measurable. Already shipping.

0+

Public repos

Open core, IDE, gateway, mesh messaging, on-device LLMs.

0

Provider integrations

Cerebras · Fireworks · Nebius · Vertex · Groq · NVIDIA · Anthropic-shape · Apple Silicon · OpenAI.

0

Stack layers

Silicon → Kernels → Models → Gateway → Agents → Products.

0+

Tokens/sec

Sustained throughput on Dragon — frontier-class speed at a fraction of the cost.

0.0%

Prompt cache hit rate

Repeated agentic system prompts on Nebius DeepSeek-V3.2-Fast — near-free after the first call.

<0s

Median planner latency

Tool-using agent loop, end-to-end, on Sonnet-class models.

0 surfaces

From one substrate

Web, voice, mobile (TokenMax). Same agentic core, three runtimes.

0 lock-in

Multi-provider by design

Routing changes are an env var. Models are commodities; the substrate is the moat.

Impact

What we ship for free.

Non-profit work. No paywall, no roadmap pressure — software for the long tail because the stack already exists.

TokenMax

On-device frontier LLMs

Beta

Private AI chat that never leaves your phone. Falcon 3 1B/3B/7B on Metal-tuned llama.cpp, with bundled separate on-device models for vision, voice, RAG, and tools. Free, forever.

Bublr

Decentralized messaging

Beta

Anonymous mesh messaging over Bluetooth + Nostr. Daily-rotating identities, geohash channels, 270+ relay fallback. Communication that works when the network doesn't. Free, forever.

The thesis

Compute is the bottleneck of the next century. We're here to break it.

Frontier models cost too much, run too slow, and concentrate power in too few hands. The lab attacks the problem at every level — kernel, scheduler, gateway, agent, app.

We build the IDE we use to build everything else. We run the inference fabric every product calls. We open-source what accelerates the field. We ship products that turn it into revenue today.

If you're a founder, an operator, or an engineer who refuses to wait for permission — this is your kind of lab.