Status & Roadmap — QuickSilver Pro

Services

90-day history · client-measured

Model availability

Synthetic probe · updates every 3 min

Roadmap — how we become a real inference company

Now — bridge phase while our GPU capacity comes online

Live

A transitional tactic so customers can already save 20% today. Requests are dispatched to the cheapest healthy open-source backend at that instant; because we focus on only three models the route table stays hot and verified. Downside of the bridge: system_fingerprint cannot be stable — backend varies by call. Stable fingerprints land with Phase 2.

Q2 2026 — our own inference stack on H100/H200

Planned

Self-hosted serving on dedicated GPUs using SGLang + continuous batching, EAGLE-3 speculative decoding, FP8 quantization via DeepGEMM, and SageAttention / ThunderMLA custom kernels. At that point system_fingerprint becomes stable (it changes only when we rev the stack), and repeatable-seed workflows start working properly. Target: 30-50% below current prices on DeepSeek V3.

H2 2026 — colocated data center + AIDC partnerships

Future

Move from rented (Vast.ai) to self-owned or colocated racks. Partner with AI-datacenter operators where that makes sense. The goal is the cheapest reliable inference for open-source models on the planet — full stack, our engineering.

About this page. Service rows run client-side probes from your browser. Model rows reflect a real 1-token probe sent server-side every 3 minutes from our backend. Historical bars show the results of recent probes stored in this browser's localStorage; cleared if you switch devices.