Design a Quota System
Role: Software Engineer
Design a quota management system that tracks and enforces resource usage limits across multiple backend services. Think Google's unified 100 GB storage quota shared between Drive, Photos, and Gmail—every upload must be checked to prevent users from exceeding their limit, even under high concurrency.
The core challenge isn't CRUD—it's strict enforcement under high concurrency without sacrificing latency or availability. A naïve implementation either allows overdraft or creates a performance bottleneck.
This walkthrough follows the Interview Framework. Use it as a guide, not a script.
Phase 1: Requirements
Functional Requirements
-
Services should be able to consume quota on behalf of a user (e.g., Drive requests 5 GB for a file upload)
-
Services should be able to release quota when resources are freed (e.g., user deletes a file)
-
Services should be able to cancel in-flight reservations for failed/aborted operations
-
Users should be able to query their current usage across all services
Assume a single global quota per user (not per-service sub-quotas) unless the interviewer adds this. Features like quota tiers by plan, admin overrides, and usage analytics are below the line.
Non-Functional Requirements
| Requirement | Target | Rationale |
|---|---|---|
| Strict consistency | No overdraft at any time | If 2 GB remaining, two concurrent 2 GB requests must not both succeed |
| Low latency (P99) | < 50ms | Quota checks are on the critical path of every upload/write |
| High availability | 99.99% | Quota service downtime blocks all upstream services |
| High throughput | 10K–100K+ QPS | Aggregated write traffic across all services is large |
The "no overdraft" requirement is what makes this problem hard. It rules out eventually consistent approaches and forces you to think carefully about concurrency control. This is the defining constraint the interviewer will probe.
Capacity Estimation
| Metric | Value |
|---|---|
| Daily Active Users | 10 million |
| Avg quota operations/user/day | 20 (uploads, deletes, edits) |
| Peak QPS | 200M ops/day ÷ 86,400 ≈ 2,300 QPS avg, 23K+ QPS peak |
| Quota record size | ~100 bytes (user_id, used, limit, reserved) |
| Total storage | 10M users × 100B = 1 GB |
Quota data is tiny—one row per user. This is a compute-bound problem (high aggregate write QPS), not a storage-bound problem. The back-of-envelope tells you sharding is about concurrency distribution, not data size.
Phase 2: Data Model
Core Entities
UserQuota
├── user_id (PK)
├── quota_limit // e.g., 100 GB (set by plan/tier)
├── used // confirmed usage across all services
├── reserved // temporarily held, pending confirmation
├── updated_at
Reservation
├── reservation_id (PK)
├── user_id (FK)
├── service_id // which upstream service (drive, photos, gmail)
├── idempotency_key // dedupe retries from same service request
├── amount // how much quota reserved
├── status // pending | confirmed | released | expired
├── created_at
├── expires_at // TTL for abandoned reservationsKey Insight: Reserve → Confirm/Cancel
The data model separates reserved from used because upstream operations (like file uploads) take time. A 5 GB file upload reserves quota immediately but only confirms it after the upload completes.
Available quota = quota_limit - used - reserved
A common mistake is tracking only used quota. Without reservations, two concurrent uploads can each see "10 GB available" and both proceed, causing overdraft. The reserved field is essential for strict enforcement.
Phase 3: API Design
Reserve Quota (Synchronous — Critical Path)
POST /v1/quota/reserve
Headers:
X-Service-Id: google-drive
Idempotency-Key: upload_abc123
Request:
{
"user_id": "user_456",
"amount_bytes": 5368709120 // 5 GB
}
Response (200):
{
"reservation_id": "res_789",
"available_after": 53687091200,
"expires_at": "2024-01-15T12:30:00Z"
}
Response (409 Conflict):
{
"error": "INSUFFICIENT_QUOTA",
"available": 2147483648, // only 2 GB available
"requested": 5368709120
}Confirm Reservation
POST /v1/quota/confirm
{ "reservation_id": "res_789" }Cancel Reservation (Failed/Aborted Upload)
POST /v1/quota/cancel
{ "reservation_id": "res_789" }Release Used Quota (Delete Path)
POST /v1/quota/release
{
"user_id": "user_456",
"amount_bytes": 5368709120,
"service_id": "google-drive",
"reference_id": "file_xyz"
}Get Usage
GET /v1/quota/usage?user_id=user_456
Response:
{
"user_id": "user_456",
"quota_limit": 107374182400, // 100 GB
"used": 53687091200, // 50 GB confirmed
"reserved": 5368709120, // 5 GB pending
"available": 48318382080 // ~45 GB available
}The reserve endpoint must be synchronous because the caller needs an immediate yes/no before proceeding with the upload. Confirm/cancel should also be synchronous for correctness. Releasing already-used quota (delete path) can be async because delayed crediting is a UX issue, not an overdraft risk.
Phase 4: High-Level Design
Architecture Overview
Component Responsibilities
| Component | Responsibility |
|---|---|
| Upstream Services | Drive, Photos, Gmail call quota service before consuming resources |
| Load Balancer | Routes requests; can use consistent hashing on user_id for locality |
| Quota Servers | Stateless application servers handling reserve/release/query logic |
| PostgreSQL | Source of truth for quota data, sharded by user_id |
| Event Bus (Optional) | Async fanout for analytics, audit logs, and downstream consumers |
| Expiration Worker | Background job that reclaims expired reservations |
Reserve Flow (Critical Path)
This is the most important flow—it must be atomic to prevent overdraft:
BEGIN;
-- Idempotency: unique(service_id, idempotency_key)
SELECT reservation_id, status, amount
FROM reservations
WHERE service_id =
AND idempotency_key =
FOR UPDATE;
-- If row exists, return it (safe retry)
WITH reserved_row AS (
UPDATE user_quota
SET reserved = reserved + ,
updated_at = NOW()
WHERE user_id =
AND (quota_limit - used - reserved) >=
RETURNING user_id
)
INSERT INTO reservations (
reservation_id, user_id, service_id, idempotency_key, amount, status, created_at, expires_at
)
SELECT , user_id, , , , 'pending', NOW(), NOW() + INTERVAL '30 minutes'
FROM reserved_row;
COMMIT;Walkthrough:
-
Drive wants to upload a 5 GB file for user_456
-
Drive calls
POST /v1/quota/reservewith amount = 5 GB -
Quota Server starts a transaction, checks idempotency, then runs conditional
UPDATE+INSERT reservation -
If the
INSERTaffects one row, reservation succeeds → returnreservation_id -
If the conditional
UPDATEfinds no row,INSERTgets no input row → return 409 (insufficient quota) -
Drive proceeds with upload only if reservation succeeded
-
After upload completes, Drive calls
POST /v1/quota/confirmto convert reserved → used -
If Drive crashes or upload fails, the reservation expires after TTL
The conditional UPDATE is still the core concurrency guard. Wrapping it with reservation creation in the same transaction gives both guarantees: no overdraft and no "reserved bytes without a reservation row" inconsistencies.
Confirm Flow
BEGIN;
WITH target AS (
SELECT reservation_id, user_id, amount, status
FROM reservations
WHERE reservation_id =
FOR UPDATE
)
UPDATE user_quota q
SET used = q.used + t.amount,
reserved = q.reserved - t.amount,
updated_at = NOW()
FROM target t
WHERE q.user_id = t.user_id
AND t.status = 'pending';
UPDATE reservations
SET status = 'confirmed'
WHERE reservation_id =
AND status = 'pending';
COMMIT;If the reservation is already confirmed, return success (idempotent retry). If it's released or expired, return 409.
Cancel Reservation Flow (Failed Upload)
BEGIN;
WITH target AS (
UPDATE reservations
SET status = 'released'
WHERE reservation_id =
AND status = 'pending'
RETURNING user_id, amount
)
UPDATE user_quota q
SET reserved = q.reserved - t.amount,
updated_at = NOW()
FROM target t
WHERE q.user_id = t.user_id;
COMMIT;Release Used Quota Flow (Delete Path)
UPDATE user_quota
SET used = used - ,
updated_at = NOW()
WHERE user_id =
AND used >= ;Reservation Expiration
If a service reserves quota but never confirms (crash, abandoned upload), the reservation must be reclaimed:
-- Worker loop: process in batches to avoid long transactions
WITH expired AS (
UPDATE reservations
SET status = 'expired'
WHERE reservation_id IN (
SELECT reservation_id
FROM reservations
WHERE status = 'pending'
AND expires_at < NOW()
ORDER BY expires_at
LIMIT 1000
FOR UPDATE SKIP LOCKED
)
RETURNING user_id, amount
)
UPDATE user_quota q
SET reserved = q.reserved - e.amount,
updated_at = NOW()
FROM expired e
WHERE q.user_id = e.user_id;Reservation TTL creates a trade-off: too short and legitimate slow uploads lose quota mid-upload; too long and abandoned reservations block available quota. Default to 30 minutes with configurable TTLs for large uploads.
Phase 5: Scaling & Trade-offs
1. Handling High Concurrency (Aggregate Load)
At peak loads of tens of thousands of QPS, a single PostgreSQL instance becomes a bottleneck due to the aggregate volume of atomic UPDATE queries. The issue isn't typically per-user "hot rows" (a single user rarely generates enough QPS to cause lock contention), but the total database write throughput.
Solution: Shard by User ID
Since each user's quota is independent, shard user_quota by user_id:
With 100 shards, no single shard sees more than 1/100th of total QPS. Since each user's operations go to exactly one shard, the atomic UPDATE still works correctly.
2. Why NOT a Cache Layer?
A natural instinct is to add Redis in front of the database. But for quota enforcement, this creates serious problems:
| Approach | Issue |
|---|---|
| Cache-aside | Two servers read "10 GB available" from cache, both proceed → overdraft |
| Write-through | Cache and DB can become inconsistent during failures |
| Redis as primary | Redis doesn't have durability guarantees for quota data |
For strict quota enforcement, the database IS the coordination layer. Adding a cache splits the atomic check-and-deduct across two systems and breaks the guarantee. Only cache the read path (GET /usage) where stale data is acceptable.
3. Replication Strategy
The "no overdraft" requirement forces strong consistency:
-
Synchronous replication for writes: Wait for a write quorum (for example, primary + one synchronous replica) before success. Preserves durability without requiring every replica.
-
Any replica for reads: Usage queries are non-critical; reading from any replica gives lower latency.
Write path (reserve): Client → Primary → Write Quorum ACK → Response
Read path (usage): Client → Any Replica → ResponseBe explicit about this trade-off: "We sacrifice some write latency for strong consistency because the cost of overdraft (giving users unpaid storage) is higher than slightly slower quota checks. The p99 for a conditional row update inside a short transaction with synchronous replication is typically 10-20ms—well within our 50ms budget."
4. Handling Reserved-but-Never-Used Quota
Three strategies work together:
TTL-based expiration (primary): Each reservation has expires_at. Background worker reclaims expired ones. Default: 30 min.
Heartbeat extension: For long-running operations, the upstream service periodically extends the reservation TTL. If heartbeats stop, the reservation expires normally.
Callback on failure: Upstream services should call cancel on failure (best-effort). TTL expiration is the safety net when callbacks fail.
5. Scaling Levels
| Scale | Architecture |
|---|---|
| < 10K QPS | Single PostgreSQL with connection pooling |
| 10K–100K QPS | Sharded PostgreSQL (10–50 shards) |
| 100K–1M QPS | Sharded PostgreSQL (50–200 shards) + read replicas |
| > 1M QPS | Per-service local quota budgets with periodic sync |
6. Per-Service Local Budgets (Advanced — Extreme Scale Only)
Central quota service allocates a "budget" to each upstream service. Each service checks locally—no network call needed. When budget runs low, request more from central.
Trade-off: Sacrifices strict per-byte enforcement for throughput. A user might briefly see different available quota from different services. Only use if strict consistency can be relaxed.
Common Pitfalls
Using read-then-write instead of atomic updates — Checking available quota in one query and deducting in another creates a TOCTOU race condition. Two concurrent requests can both pass the check and both deduct, causing overdraft.
Adding a cache for quota enforcement — Caching introduces stale reads that break strict consistency. The database is the coordination point. Only cache for non-critical read paths.
Forgetting reservation expiration — Without TTLs on reservations, crashed services permanently "leak" quota. Always implement a background expiration worker.
Sharding by service instead of user — A global user quota would be split across service-owned shards, forcing cross-shard coordination for every reserve check. Shard by user_id so each user's quota is authoritative in one place.
Interview Checklist
| Topic | Key Points to Cover |
|---|---|
| Strict enforcement | Conditional UPDATE inside a transaction prevents overdraft |
| Reserve-confirm-cancel | Explicit cancel path handles failed/aborted in-flight operations |
| Reservation TTL | Background worker reclaims abandoned reservations |
| No cache for writes | Cache introduces consistency gaps; DB is the coordination layer |
| Sharding | Shard by user_id to distribute aggregate write load |
| Replication | Synchronous writes for durability, async reads for performance |
| Back-of-envelope | Tiny data (1 GB), compute-bound problem, not storage-bound |
| Scaling levels | Single DB → Sharded → Local budgets depending on QPS |
Summary
| Aspect | Solution | Rationale |
|---|---|---|
| Core abstraction | UserQuota + Reservations | Separates confirmed usage from in-flight operations |
| Overdraft prevention | Conditional UPDATE + transactional insert | No TOCTOU race and no orphaned reservation state |
| In-flight operations | Reserve → Confirm / Cancel | TTL expiration reclaims abandoned quota |
| Concurrency | Shard by user_id | Distributes aggregate write load across DB shards |
| Consistency | Strong for writes, eventual for reads | No overdraft guarantee requires strong writes |
| Scaling | Stateless servers + sharded PostgreSQL | Handles 10K–1M QPS with horizontal scaling |
| Extreme scale | Per-service local budgets | Trades strict consistency for throughput |