ROI Calculator · LLM MDA 2.1

How much does your company
save with private AI?

Payments in dollars, exchange rate fluctuations, and tokens that drive up your bill at the end of the month. The cost of public LLMs can make scaling your AI operation unfeasible. Find out how much your company saves by running agents on a Private Cloud with optimized SLMs.

Payments in USD · exchange rate fluctuations Cost per token · unpredictable High latency · US routing
85%
Savings of up to
4.2 m
Average payback
100%
In Brazil
The Math of Tokens

Not all AI consumes the same amount of tokens.

Before calculating, it's important to understand that automation agent tasks consume exponentially more tokens than general tasks. These numbers vary by use case.

Customer Service / Q&A

~20–50 tokens / request

Summaries, short questions and answers, ticket classification. High volume of requests, but each one is compact. Ideal for optimized SLM.

Coding / Data / BI

~50–100 tokens / req

Code generation, ETL, SQL queries, BI analytics. Structured technical context and responses with multi-step logic.

Automation agents

~150–300 tokens / req

SDR, conversational BI, multiple tool calls with RAG and multi-turn reasoning. Each agent step multiplies consumption.

Interactive Calculator

Simulate your company's scenario.

Adjust the controls below. The cost, savings, and graph update in real time as you change the volume and operation tier.

Active users 100employees
50100200350500
Operation tier IntermediateBI · coding · lightweight agents
Big-Tech (average)
R$ 0
/ month · APIs per token
MDA LLM (private)
R$ 0
/ month · R$ 0 per user
Average savings
0%
vs public APIs
Monthly cost · 12-month projection BRL · log scale
OpenAI GPT-5.5 Claude Opus 4.7 Gemini 3.1 Pro Grok 4 MDA LLM (private)
Allocated GPUs0
Ratio vs Big-Tech
Payback
Breakdown by model · estimated monthly cost
OpenAI GPT-5.5
Claude Opus 4.7
Gemini 3.1 Pro
Grok 4
MDA LLM privatebase
Assumptions and formulas used in the calculation
MDA LLM (private)

Fixed price per user/month · Dedicated GPU · vLLM · MDA LLM 2.1 (MoE FP8, 32B total · 3.3B active · 256k context) · Brazil data center · 99.5% SLA.

Price = users × price/user/month (with linear volume discount)

List (100 users): Basic R$ 103 · Intermediate R$ 107 · Advanced R$ 110

Floor (500+ users): BRL 89 (~USD 17) across all tiers · discount interpolates linearly between 100 and 500.

Includes dedicated GPU infrastructure (80 Basic · 40 Intermediate · 15 Advanced users/GPU), operation, SLA, and support. Standard business hours (9 AM–10 PM, 22 days), fair-use policy.

Big-Tech (APIs per token)

Enterprise Frontier 2026 pricing · USD per 1M tokens · billed based on actual usage.

Tokens/month = users × requests/day × tokens × 22 business days

Cost = (in × $/1M_in + out × $/1M_out) × R$ 5/USD

o3 $20/$80 · Opus 4.7 $15/$75 · Gemini 2.5 Pro $2.50/$15 · Grok 4 $3/$15

Volume per tier (req/day · tokens in/out)

Basic · 40 req · 200 in / 300 out (Q&A · search)

Intermediate · 40 req · 2,500 in / 1,200 out (BI · coding)

Advanced · 25 req · 12,000 in / 2,500 out (agents · multi-tool)

Complete Reference Table

Monthly cost by tier, plan, and volume.

Exchange rate: R$ 5.00/USD · 22 business days/month · Official API prices as of May 2026. MDA = fixed price per user/month with a linear volume discount (100 → 500 users).

Model Price / 1M tokens 100 users 200 users 500 users
BasicQ&A · search · summary · 40 req/day · 200 in / 300 out
OpenAI GPT-5.5
$5 in · $30 out R$ 4.4k R$ 8.8k R$ 22.0k
Claude Opus 4.7
$5 in · $25 out R$ 3.7k R$ 7.5k R$ 18.7k
Gemini 3.1 Pro
$2 in · $12 out R$ 1.8k R$ 3.5k R$ 8,8k
Grok 4
$3 in · $15 out R$ 2.2k R$ 4.5k R$ 11.2k
MDA LLM Basic
R$ 103/user → BRL 89 (~USD 17) (500+) R$ 10.3k R$ 19.9k R$ 44.5k
IntermediateBI · coding · lightweight agents · 40 req/day · 2,500 in / 1,200 out
OpenAI GPT-5.5
$5 in · $30 out R$ 21.3k R$ 42.7k R$ 107k
Claude Opus 4.7
$5 in · $25 out R$ 18,7k R$ 37.4k R$ 93.5k
Gemini 3.1 Pro
$2 in · $12 out R$ 8.5k R$ 17.1k R$ 42,7k
Grok 4
$3 in · $15 out R$ 11,2k R$ 22.4k R$ 56.1k
MDA LLM Intermediate
R$ 107/user → BRL 89 (~USD 17) (500+) R$ 10.7k R$ 20.5k R$ 44,5k
Advancedagents · multi-tool · RAG · 25 req/day · 12,000 in / 2,500 out
OpenAI GPT-5.5
$5 in · $30 out R$ 37.1k R$ 74.3k R$ 186k
Claude Opus 4.7
$5 in · $25 out R$ 33.7k R$ 67.4k R$ 168k
Gemini 3.1 Pro
$2 in · $12 out R$ 14.9k R$ 29.7k R$ 74,3k
Grok 4
$3 in · $15 out R$ 20.2k R$ 40.4k R$ 101k
MDA LLM Advanced
R$ 110/user → BRL 89 (~USD 17) (500+) R$ 11.0k R$ 20.9k R$ 44,5k
How to read: APIs per token (OpenAI, Claude, Gemini, Grok) scale with volume × tokens × days. MDA is fixed per user/month (Basic R$ 103 · Intermediate R$ 107 · Advanced R$ 110, with a linear discount down to BRL 89 (~USD 17) at 500 users). Includes dedicated GPU infrastructure (MDA LLM 2.1 · MoE FP8 vLLM · 32B total · 3.3B active · 256k context · capacity: 80 / 40 / 15 users per GPU), operation, 99.5% SLA, and support. Data sovereignty guaranteed in a data center in Brazil.
Benchmark · Actual Latency

Cost is irrelevant if the AI takes too long to respond.

We compared the latency of public models (standard US route) versus our private infrastructure in a data center in Brazil. Time to first token (TTFT).

Q&A · Summaries
~20–50 tokens
OpenAI / Claude
~2.0s
MDA LLM 2.1 · BR
2.11s · P95: 3.48s
Coding · BI · ETL
~50–100 tokens
OpenAI / Claude
~2.8s
MDA LLM 2.1 · BR
2.23s · P95: 4.18s
Agents · Tool-calls
~150–300 tokens
OpenAI / Claude
~4.8s+
MDA LLM 2.1 · BR
4.19s · P95: 8.22s
3,000 TPS
Sustainable throughput
84–97
Concurrent users
30+ TPS
Stable per user
Memory & Allocation

The difference between 70B parameters
and 3.3B activated per token.

Traditional models require extremely expensive clusters. The MDA LLM 2.1's MoE (Mixture of Experts) architecture uses a fraction of the VRAM while maintaining the same quality — and still processes 256k tokens of context.

Traditional · 70B

Dense A100/H100 cluster

Dense models with 70B+ parameters · 32k–128k context
140GB+
  • All parameters activated for every token
  • Expensive GPU cluster · high USD costs
  • Throughput limited by dozens of concurrent tasks
MoE · MDA LLM 2.1

Quantized FP8 · vLLM

Mixture of Experts · 32B total · 3.3B active · 256k context
~25GB
  • Only 3.3B parameters activated per token
  • 256k context tokens · fits an entire book
  • Fits on affordable GPUs · fixed BRL cost
  • 84–97 concurrent users with >30 stable TPS
For C-Level Executives

Why CFOs, CIOs, and CTOs choose MDA.

For the CFO

No surprises on your credit card.

No more exchange rate fluctuations or USD accounts that double at the end of the month. You pay in Brazilian reais, with a fixed cost tied to your infrastructure. Absolute budget predictability.

0%exchange rate volatility
For the CIO

Privacy and LGPD compliance guaranteed.

Your data never leaves Brazil. The stack runs on a private VPC (10.20.0.0/16) with LiteLLM proxies and isolated vLLM engines. Immutable audit logs, RBAC, compliance by design.

100%Brazilian data centers
For the CTO

SLMs trained for your business.

We don't use giant generic models for specific tasks. We use the right computing power for the right problem, applying fine-tuning and RAG with your company's data.

LoRA+ Native RAG
MDA Consulting

You've calculated the savings.
But what about the migration?

Switching from public LLMs to private SLMs requires architecture, data orchestration, and quantization fine-tuning. If your team doesn't have the bandwidth for this, MDA Consulting can handle it for you.

Use Case Assessmenttier · volume · latency
GPU ArchitecturevLLM + LiteLLM + Qdrant
Fine-tuning with your dataLoRA · QLoRA · RAG
SLA Guaranteecontractual cost + latency
I want an architecture assessment 60-minute diagnosis · no obligation
🚀 Visualize the shift

Migrating to SLMs in 3 layers.

See how OPEX (operational expenses) transforms into smart CAPEX (private capacity).

01As-is

Public API

  • Generalist LLMs70B–200B parameters
  • Variable & exchange rate-based costUSD per token · no cap
  • International latencyUS routing · unstable peak
02To-be · transition

Quantization & optimization

  • MDA LLM 2.1 · MoE32B total · ~3.3B active · 256k context
  • FP8 quantization50% lower than FP16 · lossless
  • RAG + fine-tuningLoRA · QLoRA · your data
03Deployment

MyDatAgent private

  • Private Cloud BRDedicated GPU · LGPD ready
  • 3,000 sustainable TPSFixed cost in BRL · unlimited usage
  • Specialized SLMsby department · context

You're not just switching providers. You're changing the economic model of AI consumption.

Open positions for Q2 / 2026

Ready to say goodbye
to dollar-denominated accounts?

Book a 30-minute demo. You'll see the MDA 2.1 LLM running on a dedicated instance, with its own use cases and live benchmarks.

Pay by credit card Response within 4 hours NDA available