Sign in →
Features1 min read

Margin Guards: Profitability Governance

Infrastructure insurance for the Agentic Era. Protect unit economics with sub-5ms circuit breakers at the gateway edge.

Updated 2026-06-15Suggest edits
Docs Features Margin Guards

Configuration#

Margin Guards are defined as YAML configuration that lives alongside your infrastructure-as-code. Each guard targets a specific metric, sets a minimum acceptable margin, and defines escalating intervention levels. No application code changes required.

aforo-margin-config.yaml
# aforo-margin-config.yaml
guards:
  - id: reasoning-token-safety
    metric: ai_reasoning_tokens
    min_margin: 25%
    actions:
      level_1:
        threshold: 30%
        notify: [ "engineering-alerts", "finance-ops" ]
      level_2:
        threshold: 20%
        throttle_rate: 5req/sec
      level_3:
        threshold: 10%
        action: block_request
INFO
Margin Guard configs are hot-reloaded. Changes take effect within 30 seconds without restarting your gateway or redeploying any service.

The Margin Formula#

Aforo computes margin in real time using the following formula, evaluated on every request against the tenant's current billing period:

Margin % = ((Revenue - COGS) / Revenue) × 100
Revenue = contracted price for the billing period. COGS = accumulated provider cost (tokens, compute, storage).

Revenue is pulled from the active subscription's rate plan. COGS is aggregated from provider cost events ingested through the metering engine. The margin percentage is cached in Redis at the gateway edge and recalculated every time a new cost event arrives.

Level 1 Warning#

MARGIN GUARD/ All Gateways
Level 1 — Warning (Margin drops below 30%)

When the margin for a tenant-metric pair falls below the Level 1 threshold, Aforo fires a webhook notification to the configured channels (Slack, PagerDuty, email). The API request proceeds normally and the customer experiences no degradation. This is the early-warning system that gives your Finance and Engineering teams time to investigate before the situation escalates. Common causes: unexpected spike in reasoning tokens, provider price increase, customer exploiting an underpriced tier.

Level 2 Throttle#

MARGIN GUARD/ All Gateways
Level 2 — Throttle (Margin drops below 20%)

When the margin breaches the Level 2 threshold, Aforo injects an X-Aforo-Throttle: true header and rate-limits the tenant to the configured throttle rate (e.g., 5 req/sec). Your application reads this header and can optionally switch to a lower-cost compute path (e.g., GPT-4o to GPT-4o-mini, or high-fidelity search to approximate search). The customer still receives responses but at reduced throughput. This buys time for the account team to renegotiate the contract or adjust the rate plan.

Level 3 Block#

MARGIN GUARD/ All Gateways
Level 3 — Block (Margin drops below 10%)

When the margin falls to the Level 3 threshold, Aforo rejects the request at the gateway with a 429 Margin Limit Exceeded response. The request never reaches your backend, so no compute cost is incurred. This is the financial kill switch: unprofitable traffic is stopped before it generates provider charges. The customer receives a clear error with a Retry-After header indicating when the next billing period begins.

Gateway Enforcement#

<5ms
Edge Decision Latency
Margin check reads from local Redis. No cross-network call. No impact on P99.

Margin Guards are enforced at the gateway edge by the same plugins that handle entitlement checks. Kong, Apigee, AWS API Gateway, Azure APIM, and MuleSoft plugins all automatically read the margin signal from the local Redis cache and apply the configured action. No backend code changes are required.

Kong
Lua plugin reads margin from Redis in log phase
Apigee
Shared Flow checks margin via KVM lookup
MuleSoft
Custom policy evaluates DataWeave expression
AWS API GW
Lambda authorizer checks margin cache
Azure APIM
Outbound policy fragment reads Redis
Direct SDK
Middleware decorator checks margin inline
WARNING
Margin Guard signals are propagated to the gateway cache within 30 seconds of a cost event. During that window, requests proceed at the previous margin level. For mission-critical cost control, reduce the cache refresh interval to 5 seconds in the gateway plugin configuration.

Best Practices#

Rollout Strategy

Always start with Level 1 in Production to gather baseline data. Only enable Level 3 after 30 days of margin data. Set Level 2 throttle rates based on your P95 traffic.

Week 1-2Deploy Level 1 only. Monitor webhook alerts. Build a margin dashboard.
Week 3-4Analyze which tenants trigger warnings. Adjust thresholds per metric.
Month 2Enable Level 2 throttle on your highest-cost metrics (e.g., reasoning tokens).
Month 2+Enable Level 3 block only after confirming the throttle rate is calibrated correctly.

Threshold Guidelines

Metric TypeRecommended L1Recommended L3
AI Reasoning Tokens30%10%
Standard API Calls20%5%
Storage / Egress25%8%
Agent Sessions35%15%
PRO TIP
AI reasoning tokens have the highest variance and the largest impact on COGS. Set wider thresholds (30% L1, 10% L3) for AI metrics to avoid false positives during burst traffic.