Sign in →
Getting Started1 min read

Core Concepts: Monetization as Infrastructure

Aforo is a decoupled orchestration plane. It handles pricing, margins, and access so your gateways only handle traffic.

Updated 2026-06-15Suggest edits
Docs Getting Started Core Concepts

The Mental Model#

Traditional billing systems sit at the end of the pipeline — they receive events after the fact and generate invoices days later. Aforo sits at the edge. It intercepts every API request, makes a sub-5ms decision (allow, throttle, or block), and asynchronously meters the usage. The gateway handles traffic. Aforo handles the business logic.

INFO
Think of Aforo as a financial co-processor for your API infrastructure. Your gateway routes packets. Aforo routes revenue.

This separation means your engineering team never writes billing code. Pricing changes, new tiers, custom enterprise deals, margin floors — all configured by Product and Finance teams in the Aforo Admin UI. Zero Jira tickets. Zero deploys.

Pillar 1: Meters#

Meters are sensors that capture raw usage events. Every API call, token consumed, storage byte written, or agent session initiated generates a usage event that flows into the Metering Engine.

API Calls
POST /v1/search — 1 call
Token Consumption
GPT-4o: 2,400 input + 800 output
Storage Events
S3 PUT: 4.2 MB uploaded
Agent Sessions
MCP tool_call: SmartSearch.query
PRO TIP
Meters are asynchronous and non-blocking. Usage ingestion happens after the API response is sent. Your P99 latency is never impacted by metering.

Pillar 2: Entitlements#

Entitlements are the real-time gatekeeper. Before an API request is processed, the gateway calls Aforo's edge cache to check: Does this tenant have access to this feature? Have they exceeded their quota? Is their subscription active?

canAccess-response.json
{
  "allowed": true,
  "plan": "enterprise",
  "feature": "advanced_search",
  "remaining_quota": 8420,
  "quota_reset": "2026-04-01T00:00:00Z",
  "entitlements": {
    "max_results_per_query": 1000,
    "real_time_indexing": true,
    "custom_ranking": true
  }
}

This check happens in <5ms because entitlements are cached in Redis at the gateway edge. No database round-trip. No cross-service call. The cache is refreshed every 30 seconds via a background sync job.

Pillar 3: Margin Guards#

Margin Guards are circuit breakers for profitability. When a tenant's usage cost (COGS) approaches or exceeds their contract revenue, Aforo automatically intervenes — before the damage hits your P&L.

Meters vs. Entitlements vs. Margin Guards

ConceptRoleWhen It RunsLatency Impact
MetersSensors that capture raw usage eventsPost-response (async)0ms — fully async
EntitlementsGatekeeper that checks access + quotaPre-request (sync)<5ms — edge cache
Margin GuardsCircuit breaker for profitabilityPre-request (sync)<5ms — same cache lookup

The Three Intervention Levels

MARGIN GUARD/ All Gateways
Level 1 — Warning (Margin < 20%): Aforo fires a webhook alert to the account manager. The API request proceeds normally. The customer is unaffected — but your Finance team knows a contract is trending unprofitable.
MARGIN GUARD/ All Gateways
Level 2 — Throttle (Margin < 10%): Aforo injects an X-Aforo-Throttle: true header into the API response. Your application reads this header and switches to a lower-cost compute path (e.g., GPT-4o → GPT-4o-mini). The customer gets a response — but at a reduced cost to you.
MARGIN GUARD/ All Gateways
Level 3 — Block (Margin < 0%): Aforo rejects the request at the gateway with a 429 Margin Limit Exceeded response. The request never reaches your backend. This is the financial kill switch — unprofitable traffic is stopped before it generates compute costs.
WARNING
Margin Guards operate on real-time COGS data. If your AI provider raises prices mid-month, Aforo detects the margin compression immediately — not 45 days later when the invoice arrives.

The Data Flow#

<5ms
Edge Enforcement
Entitlement + margin check from local Redis. Zero cross-network call. Zero impact on P99.

Every API request follows a four-stage lifecycle:

request-lifecycle.txt
1. REQUEST HITS GATEWAY
   └── Kong / Apigee / AWS / Azure / MuleSoft receives the inbound request

2. AFORO EDGE CHECK (<5ms)
   ├── Redis cache lookup: tenant entitlements + quota + margin status
   ├── Decision: ALLOW | THROTTLE | BLOCK
   └── Response header injected: X-Aforo-Remaining: 8420

3. API PROCESSES REQUEST
   └── Your application logic executes (Aforo is invisible here)

4. ASYNC USAGE INGESTION
   ├── Gateway plugin fires async POST to Aforo Metering Engine
   ├── Event validated, deduplicated, enriched
   ├── Rated against active Offer rules
   └── Routed to billing pipeline (wallet drawdown or invoice accrual)

The Latency Guarantee#

<5ms
Edge Decision
Entitlement + margin check at the gateway
0ms
Metering Overhead
Usage ingestion is fully async, post-response
30s
Cache Refresh
Entitlement cache synced from source of truth
PRO TIP
Aforo adds zero latency to your hot path. The entitlement check reads from a local Redis instance co-located with your gateway. Usage metering fires asynchronously after the response is returned. Your P99 is untouched.