Customer Service / Q&A
Summaries, short questions and answers, ticket classification. High volume of requests, but each one is compact. Ideal for optimized SLM.
Payments in dollars, exchange rate fluctuations, and tokens that drive up your bill at the end of the month. The cost of public LLMs can make scaling your AI operation unfeasible. Find out how much your company saves by running agents on a Private Cloud with optimized SLMs.
Before calculating, it's important to understand that automation agent tasks consume exponentially more tokens than general tasks. These numbers vary by use case.
Summaries, short questions and answers, ticket classification. High volume of requests, but each one is compact. Ideal for optimized SLM.
Code generation, ETL, SQL queries, BI analytics. Structured technical context and responses with multi-step logic.
SDR, conversational BI, multiple tool calls with RAG and multi-turn reasoning. Each agent step multiplies consumption.
Adjust the controls below. The cost, savings, and graph update in real time as you change the volume and operation tier.
Fixed price per user/month · Dedicated GPU · vLLM · MDA LLM 2.1 (MoE FP8, 32B total · 3.3B active · 256k context) · Brazil data center · 99.5% SLA.
Price = users × price/user/month (with linear volume discount)
List (100 users): Basic R$ 103 · Intermediate R$ 107 · Advanced R$ 110
Floor (500+ users): BRL 89 (~USD 17) across all tiers · discount interpolates linearly between 100 and 500.
Includes dedicated GPU infrastructure (80 Basic · 40 Intermediate · 15 Advanced users/GPU), operation, SLA, and support. Standard business hours (9 AM–10 PM, 22 days), fair-use policy.
Enterprise Frontier 2026 pricing · USD per 1M tokens · billed based on actual usage.
Tokens/month = users × requests/day × tokens × 22 business days
Cost = (in × $/1M_in + out × $/1M_out) × R$ 5/USD
o3 $20/$80 · Opus 4.7 $15/$75 · Gemini 2.5 Pro $2.50/$15 · Grok 4 $3/$15
Basic · 40 req · 200 in / 300 out (Q&A · search)
Intermediate · 40 req · 2,500 in / 1,200 out (BI · coding)
Advanced · 25 req · 12,000 in / 2,500 out (agents · multi-tool)
Exchange rate: R$ 5.00/USD · 22 business days/month · Official API prices as of May 2026. MDA = fixed price per user/month with a linear volume discount (100 → 500 users).
| Model | Price / 1M tokens | 100 users | 200 users | 500 users |
|---|---|---|---|---|
| BasicQ&A · search · summary · 40 req/day · 200 in / 300 out | ||||
OpenAI GPT-5.5 |
$5 in · $30 out | R$ 4.4k | R$ 8.8k | R$ 22.0k |
Claude Opus 4.7 |
$5 in · $25 out | R$ 3.7k | R$ 7.5k | R$ 18.7k |
Gemini 3.1 Pro |
$2 in · $12 out | R$ 1.8k | R$ 3.5k | R$ 8,8k |
Grok 4 |
$3 in · $15 out | R$ 2.2k | R$ 4.5k | R$ 11.2k |
MDA LLM Basic |
R$ 103/user → BRL 89 (~USD 17) (500+) | R$ 10.3k | R$ 19.9k | R$ 44.5k |
| IntermediateBI · coding · lightweight agents · 40 req/day · 2,500 in / 1,200 out | ||||
OpenAI GPT-5.5 |
$5 in · $30 out | R$ 21.3k | R$ 42.7k | R$ 107k |
Claude Opus 4.7 |
$5 in · $25 out | R$ 18,7k | R$ 37.4k | R$ 93.5k |
Gemini 3.1 Pro |
$2 in · $12 out | R$ 8.5k | R$ 17.1k | R$ 42,7k |
Grok 4 |
$3 in · $15 out | R$ 11,2k | R$ 22.4k | R$ 56.1k |
MDA LLM Intermediate |
R$ 107/user → BRL 89 (~USD 17) (500+) | R$ 10.7k | R$ 20.5k | R$ 44,5k |
| Advancedagents · multi-tool · RAG · 25 req/day · 12,000 in / 2,500 out | ||||
OpenAI GPT-5.5 |
$5 in · $30 out | R$ 37.1k | R$ 74.3k | R$ 186k |
Claude Opus 4.7 |
$5 in · $25 out | R$ 33.7k | R$ 67.4k | R$ 168k |
Gemini 3.1 Pro |
$2 in · $12 out | R$ 14.9k | R$ 29.7k | R$ 74,3k |
Grok 4 |
$3 in · $15 out | R$ 20.2k | R$ 40.4k | R$ 101k |
MDA LLM Advanced |
R$ 110/user → BRL 89 (~USD 17) (500+) | R$ 11.0k | R$ 20.9k | R$ 44,5k |
We compared the latency of public models (standard US route) versus our private infrastructure in a data center in Brazil. Time to first token (TTFT).
Traditional models require extremely expensive clusters. The MDA LLM 2.1's MoE (Mixture of Experts) architecture uses a fraction of the VRAM while maintaining the same quality — and still processes 256k tokens of context.
No more exchange rate fluctuations or USD accounts that double at the end of the month. You pay in Brazilian reais, with a fixed cost tied to your infrastructure. Absolute budget predictability.
Your data never leaves Brazil. The stack runs on a private VPC (10.20.0.0/16) with LiteLLM proxies and isolated vLLM engines. Immutable audit logs, RBAC, compliance by design.
We don't use giant generic models for specific tasks. We use the right computing power for the right problem, applying fine-tuning and RAG with your company's data.
Switching from public LLMs to private SLMs requires architecture, data orchestration, and quantization fine-tuning. If your team doesn't have the bandwidth for this, MDA Consulting can handle it for you.
See how OPEX (operational expenses) transforms into smart CAPEX (private capacity).
You're not just switching providers. You're changing the economic model of AI consumption.
Book a 30-minute demo. You'll see the MDA 2.1 LLM running on a dedicated instance, with its own use cases and live benchmarks.