QuickSilver Pro vs DeepInfra
DeepInfra is the budget-friendly option among DeepSeek resellers. QuickSilver Pro is still lower: ~20% cheaper on DeepSeek V3 output, ~22% cheaper on DeepSeek R1 output. If you're cost-sensitive enough to already be on DeepInfra, the further savings compound. Same OpenAI-compatible API, two-line migration.
At a glance
| Feature | QuickSilver Pro | DeepInfra |
|---|---|---|
| Catalog focus | 3 open-source LLMs | 60+ open models, vision, audio |
| DeepSeek V3 output price | $0.70 / 1M | $0.88 / 1M |
| DeepSeek R1 output price | $1.70 / 1M | $2.19 / 1M |
| Cached input discount | Not yet | Yes (DeepSeek V3/V3.1) |
| Embeddings · audio · image | No | Yes |
| Dedicated deployments | No | Yes |
| OpenAI-compatible chat | Yes | Yes |
| Minimum top-up | $5 | $20 |
Pricing (per million tokens, USD)
Public list prices as of April 2026. DeepInfra also offers cached-input discounts (not shown).
| Model | QSP input | QSP output | DeepInfra input | DeepInfra output | Output savings |
|---|---|---|---|---|---|
| DeepSeek V3 | $0.24 | $0.70 | $0.28 | $0.88 | ~20% |
| DeepSeek R1 | $0.40 | $1.70 | $0.55 | $2.19 | ~22% |
| Qwen3.5-35B-A3B | $0.13 | $1.00 | Comparable | — | |
On a DeepSeek V3 workload (1M input + 300k output per day), QuickSilver Pro is $0.45/day vs DeepInfra's $0.54/day. The gap is smaller than against Together or Fireworks, but still meaningful at scale.
Migration — two lines
from openai import OpenAI
client = OpenAI(
base_url="https://api.deepinfra.com/v1/openai",
api_key=os.environ["DEEPINFRA_KEY"],
)
r = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V3",
messages=[{"role": "user", "content": "Hi"}],
)
from openai import OpenAI
client = OpenAI(
base_url="https://api.quicksilverpro.io/v1",
api_key=os.environ["QSP_KEY"],
)
r = client.chat.completions.create(
model="deepseek-v3",
messages=[{"role": "user", "content": "Hi"}],
)
deepseek-ai/DeepSeek-V3 → deepseek-v3deepseek-ai/DeepSeek-R1 → deepseek-r1Qwen/Qwen3.5-35B-A3B → qwen3.5-35bHonest tradeoffs
- ›You want the lowest per-token list price on DeepSeek V3 and R1.
- ›Your workload doesn't benefit much from DeepInfra's cache discount (low repeat-prompt ratio).
- ›You want $5 minimum top-up vs $20.
- ›You rely on their cached-input discount (>50% cache hit rate).
- ›You use embeddings, Whisper audio, or image models.
- ›You need Llama, Mistral, or other open models beyond DeepSeek and Qwen.
- ›You want serverless GPU for your own custom models (container-based hosting, per-second billing) — we only serve three curated models.
- ›You can tolerate latency for discounted batch inference — DeepInfra offers a batch endpoint; we only serve real-time.
- ›Your app spans modalities beyond text — vision / OCR / speech-to-text / TTS are all in DeepInfra's catalog and out of ours.
FAQ
How much cheaper is it?
On list pricing: ~14% cheaper input + ~20% cheaper output on DeepSeek V3. ~27% cheaper input + ~22% cheaper output on DeepSeek R1. Cached-input pricing on DeepInfra can change the math; compare effective per-request cost for cache-heavy workloads.
How do I migrate?
Two lines: swap base_url to api.quicksilverpro.io/v1, new API key, drop the deepseek-ai/ or Qwen/ prefix.
Does QuickSilver Pro support prompt caching?
Not yet as a separate rate. DeepInfra's cached-input discount can lower effective input cost for repeat prompts. Benchmark both if cache-hit ratio is material for your workload.
What about embeddings / audio / images?
Not offered. QuickSilver Pro is chat completions only on three LLMs. DeepInfra covers those modalities.
Monthly cost walkthrough
A mixed hobby / production SaaS — indie app with V3 for general chat and R1 for the "explain your reasoning" feature, split evenly. Monthly footprint: 10M input tokens and 3M output tokens, split 50/50 between V3 and R1.
V3 5M × $0.24 = $1.20
V3 1.5M × $0.70 = $1.05
R1 5M × $0.40 = $2.00
R1 1.5M × $1.70 = $2.55
—————————————————————
Total = $6.80/mo
V3 5M × $0.28 = $1.40 V3 1.5M × $0.88 = $1.32 R1 5M × $0.55 = $2.75 R1 1.5M × $2.19 = $3.29 ————————————————————— Total = $8.76/mo
That's $1.96 saved each month, ~22% off. The delta looks small in absolute terms because DeepInfra is already aggressively priced — but the shape of the savings is worth noting: R1 contributes ~$1.49 of the $1.96, so the heavier your reasoning usage gets, the more pronounced the gap. Cache-hit-heavy workloads on DeepInfra can close some of this — benchmark on real traffic before switching.
Uptime & reliability
QuickSilver Pro is in a bridge phase: requests route across multiple upstream inference providers on the same open-source weights. If one upstream has an outage or hits capacity, the router fails over to the next. Per-model availability, p50 / p95 latency, and incident history are published on our status page. Our own GPU capacity comes online in Q2 2026 and the routing changes shape at that point.
DeepInfra operates its own GPU fleet and does not publish a real-time public status page or uptime dashboard at the time of writing — we don't want to invent numbers we can't verify. Their incident communication runs through their community Discord and status posts rather than a dedicated URL we can cite. If uptime transparency is load-bearing for your decision, both teams will share recent incident data on request; don't decide on PR puff either way.