Traditional billing systems sit at the end of the pipeline — they receive events after the fact and generate invoices days later. Aforo sits at the edge. It intercepts every API request, makes a sub-5ms decision (allow, throttle, or block), and asynchronously meters the usage. The gateway handles traffic. Aforo handles the business logic.
INFO
Think of Aforo as a financial co-processor for your API infrastructure. Your gateway routes packets. Aforo routes revenue.
This separation means your engineering team never writes billing code. Pricing changes, new tiers, custom enterprise deals, margin floors — all configured by Product and Finance teams in the Aforo Admin UI. Zero Jira tickets. Zero deploys.
Meters are sensors that capture raw usage events. Every API call, token consumed, storage byte written, or agent session initiated generates a usage event that flows into the Metering Engine.
API Calls
POST /v1/search — 1 call
Token Consumption
GPT-4o: 2,400 input + 800 output
Storage Events
S3 PUT: 4.2 MB uploaded
Agent Sessions
MCP tool_call: SmartSearch.query
PRO TIP
Meters are asynchronous and non-blocking. Usage ingestion happens after the API response is sent. Your P99 latency is never impacted by metering.
Entitlements are the real-time gatekeeper. Before an API request is processed, the gateway calls Aforo's edge cache to check: Does this tenant have access to this feature? Have they exceeded their quota? Is their subscription active?
This check happens in <5ms because entitlements are cached in Redis at the gateway edge. No database round-trip. No cross-service call. The cache is refreshed every 30 seconds via a background sync job.
Margin Guards are circuit breakers for profitability. When a tenant's usage cost (COGS) approaches or exceeds their contract revenue, Aforo automatically intervenes — before the damage hits your P&L.
Meters vs. Entitlements vs. Margin Guards
ConceptRoleWhen It RunsLatency Impact
MetersSensors that capture raw usage eventsPost-response (async)0ms — fully async
EntitlementsGatekeeper that checks access + quotaPre-request (sync)<5ms — edge cache
Margin GuardsCircuit breaker for profitabilityPre-request (sync)<5ms — same cache lookup
The Three Intervention Levels
MARGIN GUARD/ All Gateways
Level 1 — Warning (Margin < 20%): Aforo fires a webhook alert to the account manager. The API request proceeds normally. The customer is unaffected — but your Finance team knows a contract is trending unprofitable.
MARGIN GUARD/ All Gateways
Level 2 — Throttle (Margin < 10%): Aforo injects an X-Aforo-Throttle: true header into the API response. Your application reads this header and switches to a lower-cost compute path (e.g., GPT-4o → GPT-4o-mini). The customer gets a response — but at a reduced cost to you.
MARGIN GUARD/ All Gateways
Level 3 — Block (Margin < 0%): Aforo rejects the request at the gateway with a 429 Margin Limit Exceeded response. The request never reaches your backend. This is the financial kill switch — unprofitable traffic is stopped before it generates compute costs.
WARNING
Margin Guards operate on real-time COGS data. If your AI provider raises prices mid-month, Aforo detects the margin compression immediately — not 45 days later when the invoice arrives.
Aforo adds zero latency to your hot path. The entitlement check reads from a local Redis instance co-located with your gateway. Usage metering fires asynchronously after the response is returned. Your P99 is untouched.