Back to Snowflake questions
System DesignSoftware Engineer

Design a Quota System

Role: Software Engineer


Design a quota management system that tracks and enforces resource usage limits across multiple backend services. Think Google's unified 100 GB storage quota shared between Drive, Photos, and Gmail—every upload must be checked to prevent users from exceeding their limit, even under high concurrency.

The core challenge isn't CRUD—it's strict enforcement under high concurrency without sacrificing latency or availability. A naïve implementation either allows overdraft or creates a performance bottleneck.

This walkthrough follows the Interview Framework. Use it as a guide, not a script.

Phase 1: Requirements

Functional Requirements

  • Services should be able to consume quota on behalf of a user (e.g., Drive requests 5 GB for a file upload)

  • Services should be able to release quota when resources are freed (e.g., user deletes a file)

  • Services should be able to cancel in-flight reservations for failed/aborted operations

  • Users should be able to query their current usage across all services

Assume a single global quota per user (not per-service sub-quotas) unless the interviewer adds this. Features like quota tiers by plan, admin overrides, and usage analytics are below the line.

Non-Functional Requirements

RequirementTargetRationale
Strict consistencyNo overdraft at any timeIf 2 GB remaining, two concurrent 2 GB requests must not both succeed
Low latency (P99)< 50msQuota checks are on the critical path of every upload/write
High availability99.99%Quota service downtime blocks all upstream services
High throughput10K–100K+ QPSAggregated write traffic across all services is large

The "no overdraft" requirement is what makes this problem hard. It rules out eventually consistent approaches and forces you to think carefully about concurrency control. This is the defining constraint the interviewer will probe.

Capacity Estimation

MetricValue
Daily Active Users10 million
Avg quota operations/user/day20 (uploads, deletes, edits)
Peak QPS200M ops/day ÷ 86,400 ≈ 2,300 QPS avg, 23K+ QPS peak
Quota record size~100 bytes (user_id, used, limit, reserved)
Total storage10M users × 100B = 1 GB

Quota data is tiny—one row per user. This is a compute-bound problem (high aggregate write QPS), not a storage-bound problem. The back-of-envelope tells you sharding is about concurrency distribution, not data size.


Phase 2: Data Model

Core Entities

text
UserQuota
├── user_id (PK)
├── quota_limit          // e.g., 100 GB (set by plan/tier)
├── used                 // confirmed usage across all services
├── reserved             // temporarily held, pending confirmation
├── updated_at

Reservation
├── reservation_id (PK)
├── user_id (FK)
├── service_id           // which upstream service (drive, photos, gmail)
├── idempotency_key      // dedupe retries from same service request
├── amount               // how much quota reserved
├── status               // pending | confirmed | released | expired
├── created_at
├── expires_at           // TTL for abandoned reservations

Key Insight: Reserve → Confirm/Cancel

The data model separates reserved from used because upstream operations (like file uploads) take time. A 5 GB file upload reserves quota immediately but only confirms it after the upload completes.

Available quota = quota_limit - used - reserved

A common mistake is tracking only used quota. Without reservations, two concurrent uploads can each see "10 GB available" and both proceed, causing overdraft. The reserved field is essential for strict enforcement.


Phase 3: API Design

Reserve Quota (Synchronous — Critical Path)

text
POST /v1/quota/reserve
Headers:
  X-Service-Id: google-drive
  Idempotency-Key: upload_abc123

Request:
{
  "user_id": "user_456",
  "amount_bytes": 5368709120   // 5 GB
}

Response (200):
{
  "reservation_id": "res_789",
  "available_after": 53687091200,
  "expires_at": "2024-01-15T12:30:00Z"
}

Response (409 Conflict):
{
  "error": "INSUFFICIENT_QUOTA",
  "available": 2147483648,  // only 2 GB available
  "requested": 5368709120
}

Confirm Reservation

text
POST /v1/quota/confirm
{ "reservation_id": "res_789" }

Cancel Reservation (Failed/Aborted Upload)

text
POST /v1/quota/cancel
{ "reservation_id": "res_789" }

Release Used Quota (Delete Path)

text
POST /v1/quota/release
{
  "user_id": "user_456",
  "amount_bytes": 5368709120,
  "service_id": "google-drive",
  "reference_id": "file_xyz"
}

Get Usage

text
GET /v1/quota/usage?user_id=user_456

Response:
{
  "user_id": "user_456",
  "quota_limit": 107374182400,   // 100 GB
  "used": 53687091200,            // 50 GB confirmed
  "reserved": 5368709120,         // 5 GB pending
  "available": 48318382080        // ~45 GB available
}

The reserve endpoint must be synchronous because the caller needs an immediate yes/no before proceeding with the upload. Confirm/cancel should also be synchronous for correctness. Releasing already-used quota (delete path) can be async because delayed crediting is a UX issue, not an overdraft risk.


Phase 4: High-Level Design

Architecture Overview

Component Responsibilities

ComponentResponsibility
Upstream ServicesDrive, Photos, Gmail call quota service before consuming resources
Load BalancerRoutes requests; can use consistent hashing on user_id for locality
Quota ServersStateless application servers handling reserve/release/query logic
PostgreSQLSource of truth for quota data, sharded by user_id
Event Bus (Optional)Async fanout for analytics, audit logs, and downstream consumers
Expiration WorkerBackground job that reclaims expired reservations

Reserve Flow (Critical Path)

This is the most important flow—it must be atomic to prevent overdraft:

sql
BEGIN;

-- Idempotency: unique(service_id, idempotency_key)
SELECT reservation_id, status, amount
FROM reservations
WHERE service_id = 
  AND idempotency_key = 
FOR UPDATE;
-- If row exists, return it (safe retry)

WITH reserved_row AS (
  UPDATE user_quota
  SET reserved = reserved + ,
      updated_at = NOW()
  WHERE user_id = 
    AND (quota_limit - used - reserved) >= 
  RETURNING user_id
)
INSERT INTO reservations (
  reservation_id, user_id, service_id, idempotency_key, amount, status, created_at, expires_at
)
SELECT , user_id, , , , 'pending', NOW(), NOW() + INTERVAL '30 minutes'
FROM reserved_row;

COMMIT;

Walkthrough:

  • Drive wants to upload a 5 GB file for user_456

  • Drive calls POST /v1/quota/reserve with amount = 5 GB

  • Quota Server starts a transaction, checks idempotency, then runs conditional UPDATE + INSERT reservation

  • If the INSERT affects one row, reservation succeeds → return reservation_id

  • If the conditional UPDATE finds no row, INSERT gets no input row → return 409 (insufficient quota)

  • Drive proceeds with upload only if reservation succeeded

  • After upload completes, Drive calls POST /v1/quota/confirm to convert reserved → used

  • If Drive crashes or upload fails, the reservation expires after TTL

The conditional UPDATE is still the core concurrency guard. Wrapping it with reservation creation in the same transaction gives both guarantees: no overdraft and no "reserved bytes without a reservation row" inconsistencies.

Confirm Flow

sql
BEGIN;

WITH target AS (
  SELECT reservation_id, user_id, amount, status
  FROM reservations
  WHERE reservation_id = 
  FOR UPDATE
)
UPDATE user_quota q
SET used = q.used + t.amount,
    reserved = q.reserved - t.amount,
    updated_at = NOW()
FROM target t
WHERE q.user_id = t.user_id
  AND t.status = 'pending';

UPDATE reservations
SET status = 'confirmed'
WHERE reservation_id = 
  AND status = 'pending';
COMMIT;

If the reservation is already confirmed, return success (idempotent retry). If it's released or expired, return 409.

Cancel Reservation Flow (Failed Upload)

sql
BEGIN;
WITH target AS (
  UPDATE reservations
  SET status = 'released'
  WHERE reservation_id = 
    AND status = 'pending'
  RETURNING user_id, amount
)
UPDATE user_quota q
SET reserved = q.reserved - t.amount,
    updated_at = NOW()
FROM target t
WHERE q.user_id = t.user_id;
COMMIT;

Release Used Quota Flow (Delete Path)

sql
UPDATE user_quota
SET used = used - ,
    updated_at = NOW()
WHERE user_id = 
  AND used >= ;

Reservation Expiration

If a service reserves quota but never confirms (crash, abandoned upload), the reservation must be reclaimed:

sql
-- Worker loop: process in batches to avoid long transactions
WITH expired AS (
  UPDATE reservations
  SET status = 'expired'
  WHERE reservation_id IN (
    SELECT reservation_id
    FROM reservations
    WHERE status = 'pending'
      AND expires_at < NOW()
    ORDER BY expires_at
    LIMIT 1000
    FOR UPDATE SKIP LOCKED
  )
  RETURNING user_id, amount
)
UPDATE user_quota q
SET reserved = q.reserved - e.amount,
    updated_at = NOW()
FROM expired e
WHERE q.user_id = e.user_id;

Reservation TTL creates a trade-off: too short and legitimate slow uploads lose quota mid-upload; too long and abandoned reservations block available quota. Default to 30 minutes with configurable TTLs for large uploads.


Phase 5: Scaling & Trade-offs

1. Handling High Concurrency (Aggregate Load)

At peak loads of tens of thousands of QPS, a single PostgreSQL instance becomes a bottleneck due to the aggregate volume of atomic UPDATE queries. The issue isn't typically per-user "hot rows" (a single user rarely generates enough QPS to cause lock contention), but the total database write throughput.

Solution: Shard by User ID

Since each user's quota is independent, shard user_quota by user_id:

With 100 shards, no single shard sees more than 1/100th of total QPS. Since each user's operations go to exactly one shard, the atomic UPDATE still works correctly.

2. Why NOT a Cache Layer?

A natural instinct is to add Redis in front of the database. But for quota enforcement, this creates serious problems:

ApproachIssue
Cache-asideTwo servers read "10 GB available" from cache, both proceed → overdraft
Write-throughCache and DB can become inconsistent during failures
Redis as primaryRedis doesn't have durability guarantees for quota data

For strict quota enforcement, the database IS the coordination layer. Adding a cache splits the atomic check-and-deduct across two systems and breaks the guarantee. Only cache the read path (GET /usage) where stale data is acceptable.

3. Replication Strategy

The "no overdraft" requirement forces strong consistency:

  • Synchronous replication for writes: Wait for a write quorum (for example, primary + one synchronous replica) before success. Preserves durability without requiring every replica.

  • Any replica for reads: Usage queries are non-critical; reading from any replica gives lower latency.

text
Write path (reserve):  Client → Primary → Write Quorum ACK → Response
Read path (usage):     Client → Any Replica → Response

Be explicit about this trade-off: "We sacrifice some write latency for strong consistency because the cost of overdraft (giving users unpaid storage) is higher than slightly slower quota checks. The p99 for a conditional row update inside a short transaction with synchronous replication is typically 10-20ms—well within our 50ms budget."

4. Handling Reserved-but-Never-Used Quota

Three strategies work together:

TTL-based expiration (primary): Each reservation has expires_at. Background worker reclaims expired ones. Default: 30 min.

Heartbeat extension: For long-running operations, the upstream service periodically extends the reservation TTL. If heartbeats stop, the reservation expires normally.

Callback on failure: Upstream services should call cancel on failure (best-effort). TTL expiration is the safety net when callbacks fail.

5. Scaling Levels

ScaleArchitecture
< 10K QPSSingle PostgreSQL with connection pooling
10K–100K QPSSharded PostgreSQL (10–50 shards)
100K–1M QPSSharded PostgreSQL (50–200 shards) + read replicas
> 1M QPSPer-service local quota budgets with periodic sync

6. Per-Service Local Budgets (Advanced — Extreme Scale Only)

Central quota service allocates a "budget" to each upstream service. Each service checks locally—no network call needed. When budget runs low, request more from central.

Trade-off: Sacrifices strict per-byte enforcement for throughput. A user might briefly see different available quota from different services. Only use if strict consistency can be relaxed.


Common Pitfalls

Using read-then-write instead of atomic updates — Checking available quota in one query and deducting in another creates a TOCTOU race condition. Two concurrent requests can both pass the check and both deduct, causing overdraft.

Adding a cache for quota enforcement — Caching introduces stale reads that break strict consistency. The database is the coordination point. Only cache for non-critical read paths.

Forgetting reservation expiration — Without TTLs on reservations, crashed services permanently "leak" quota. Always implement a background expiration worker.

Sharding by service instead of user — A global user quota would be split across service-owned shards, forcing cross-shard coordination for every reserve check. Shard by user_id so each user's quota is authoritative in one place.


Interview Checklist

TopicKey Points to Cover
Strict enforcementConditional UPDATE inside a transaction prevents overdraft
Reserve-confirm-cancelExplicit cancel path handles failed/aborted in-flight operations
Reservation TTLBackground worker reclaims abandoned reservations
No cache for writesCache introduces consistency gaps; DB is the coordination layer
ShardingShard by user_id to distribute aggregate write load
ReplicationSynchronous writes for durability, async reads for performance
Back-of-envelopeTiny data (1 GB), compute-bound problem, not storage-bound
Scaling levelsSingle DB → Sharded → Local budgets depending on QPS

Summary

AspectSolutionRationale
Core abstractionUserQuota + ReservationsSeparates confirmed usage from in-flight operations
Overdraft preventionConditional UPDATE + transactional insertNo TOCTOU race and no orphaned reservation state
In-flight operationsReserve → Confirm / CancelTTL expiration reclaims abandoned quota
ConcurrencyShard by user_idDistributes aggregate write load across DB shards
ConsistencyStrong for writes, eventual for readsNo overdraft guarantee requires strong writes
ScalingStateless servers + sharded PostgreSQLHandles 10K–1M QPS with horizontal scaling
Extreme scalePer-service local budgetsTrades strict consistency for throughput