Push Notification System for New Posts

Role: Software Engineer

Design a notification system that sends push notifications to followers when an author publishes a new post. Followers may have multiple devices, can mute or disable notifications, and some authors can have millions of followers.

This walkthrough follows the Interview Framework and focuses on what you would actually present in a 45-60 minute interview.

Keep the scope tight. This question is about reliable notification fanout and delivery, not about designing the full feed-ranking or recommendation system.

Phase 1: Requirements (~5 minutes)

Functional Requirements

Users should be able to publish a new post that triggers notifications to eligible followers
Followers should be able to receive push notifications on their registered devices in near real time
Users should be able to configure notification preferences such as enable/disable, mute, and quiet hours
The system should avoid duplicate notifications and retry transient delivery failures safely
Users should be able to open the notification and deep-link to the new post, while the system tracks delivery/open status

Features like feed ranking, email digests, mention notifications, and recommendation logic are useful follow-ups, but they should stay below the line in the initial design.

Non-Functional Requirements

Requirement	Target	Rationale
Latency	p50 under 5s, p99 under 30s for normal authors	Notifications should feel real time
Availability	99.9%+	Users expect post alerts to work consistently
Durability	No lost post events	Missing notifications are hard to recover from
Correctness	No duplicate pushes for the same post/follower/channel	Duplicate alerts destroy trust quickly
Scalability	30M DAU, 15M posts/day, celebrity fanout up to 20M followers	The hot-author case dominates the design
Cost efficiency	Minimize unnecessary provider sends	Push providers and fanout pipelines cost real money

The main insight is that this is not a simple "send one message" system. It is a massive fanout pipeline with filtering, deduplication, rate limiting, and third-party delivery constraints.

Capacity Estimation

text

Assumptions:
- Daily active users: 30 million
- Posts per day: 15 million
- Average opted-in followers per post: 200
- Candidate notifications per day: 15M * 200 = 3B

Traffic:
- Average posts/sec: 15M / 86,400 ~= 174
- Average notifications/sec: 3B / 86,400 ~= 35K
- Peak notifications/sec (10x burst): 350K+
- Celebrity post: up to 20M recipients for a single post

Storage:
- Compact notification metadata per recipient: ~250 bytes
- Daily hot storage: 3B * 250 bytes ~= 750 GB/day
- 7-day hot retention ~= 5.25 TB before replication

Do not store the fully rendered push payload for every recipient. Store compact metadata such as template ID, actor ID, post ID, and state. Render the final provider payload at dispatch time.

Phase 2: Data Model (~5 minutes)

Core Entities

Notification Lifecycle

Provider acceptance is not the same as user-visible delivery. APNs or FCM may accept the message, but the device can still be offline, the token can be stale, or the user may never open the push.

Notification is the follower-level logical record. NotificationAttempt stores per-device outcomes. The parent Notification.status is aggregated from child attempts: dispatched means at least one device was accepted by a provider, opened means any device opened it, and exhausted means all target devices permanently failed or the retry budget was spent.

Quiet hours usually mean defer, not drop. Use scheduled_for to store the first eligible delivery time and place the notification onto a delayed queue or scheduled partition until that time arrives.

Phase 3: API Design (~5 minutes)

Protocol Choice

REST for post creation, device registration, and preference management
Durable event stream / queue for internal async fanout
APNs / FCM / Web Push for actual device delivery

WebSocket is not the primary protocol here because the core requirement is offline-capable push delivery. If the product also needs a live in-app notification center, you can add WebSocket or SSE later as a secondary channel.

Client-Facing APIs

http

# Create a post
POST /api/posts
Content-Type: application/json

{
  "text": "We just launched a new feature",
  "visibility": "public",
  "notify_followers": true
}

Response:
{
  "post_id": "post_123",
  "created_at": "2026-03-12T22:10:00Z"
}

# Register or refresh a device token
POST /api/devices
{
  "provider": "apns",
  "platform": "ios",
  "push_token": "token_abc"
}

# Update global push preferences
PUT /api/notification-settings
{
  "push_enabled": true,
  "quiet_hours_start": 22,
  "quiet_hours_end": 7,
  "timezone": "America/Los_Angeles"
}

# Update post-notification preferences for an author
PUT /api/follows/{author_id}/notification-settings
{
  "notify_on_post": true,
  "muted": false
}

# Fetch notification history / in-app inbox
GET /api/notifications?cursor=notif_456&limit=50

# Client acknowledges that a notification was opened
POST /api/notifications/{notification_id}/ack
{
  "device_id": "dev_789",
  "event": "opened"
}

Internal Events

json

// Published after the post transaction commits
{
  "event_type": "post_created",
  "post_id": "post_123",
  "author_id": "user_42",
  "visibility": "public",
  "created_at": "2026-03-12T22:10:00Z"
}

// Fanout shard job
{
  "job_id": "job_555",
  "post_id": "post_123",
  "author_id": "user_42",
  "shard_id": 18,
  "cursor": "follower_9000000"
}

Use a transactional outbox between POST /api/posts and the post_created event. Otherwise you risk writing the post successfully but losing the notification trigger if the process crashes before publishing to Kafka or your queue.

Phase 4: High-Level Design

Architecture Overview

Write and Fanout Flow

User creates a post through POST /api/posts
Post Service stores the post and writes a post_created outbox row in the same transaction
Outbox relay publishes the event to a durable bus such as Kafka, Pulsar, or SQS + SNS
Notification Orchestrator validates that the post is eligible for notifications and creates fanout shard jobs
Fanout Workers scan follower shards for that author, apply per-author overrides plus global quiet-hour filters, and create idempotent notification records backed by a unique key in the Notification Store
If the follower is inside quiet hours, the worker sets scheduled_for and pushes the notification onto the delayed side of the dispatch queue; otherwise it is ready immediately
Dispatch Workers fetch active device tokens from the Device Registry, batch by provider, and send push requests through APNs, FCM, or Web Push
Mobile or web clients deep-link into the post and optionally acknowledge opened back to the Notification API

Key Components

Component	Responsibility	Notes
Post Service	Writes posts and outbox rows	Strong consistency at write time
Follower Graph	Stores followers by `(author_id, shard_id)`	Prevents celebrity authors from becoming one hot partition
Preference Store	Global push settings plus per-author overrides	Quiet hours are user-level; mute/follow overrides are edge-level
Device Registry	Stores active push tokens per user/device	Dispatch uses it for token lookup and invalid-token cleanup
Notification Orchestrator	Decides how to fan out a post	Splits normal vs celebrity handling
Notification Store	Persists lifecycle state and inbox data	Unique key on `(post_id, follower_id, channel)` is the dedupe source of truth
Dispatch Workers	Send to providers with retry logic	Batch by provider and platform
DLQ	Holds poison messages or repeated failures	Required for safe recovery

Recommended Storage Choices

Data	Store	Reasoning
Posts	PostgreSQL / MySQL	Transactional writes and product metadata
Follower graph	Cassandra / Bigtable / DynamoDB	Bucket followers by `(author_id, shard_id)` for scalable fanout scans
Device registry	PostgreSQL / DynamoDB	Active token lookup by user plus invalid-token updates
Notification state	Cassandra / DynamoDB	Very high write throughput with TTL
Provider rate limits + optional prefilter	Redis	Fast token buckets, counters, and hot retry suppression
Async transport	Kafka / Pulsar / SQS	Durable, replayable fanout pipeline

Store per-author overrides such as notify_on_post and muted on the follow edge, but keep global quiet hours in a separate user preference record. That avoids rewriting every follow row when a user changes timezone or sleep schedule.

Redis can be used as an optional prefilter for hot retries, but the authoritative dedupe guarantee should come from the unique key in the Notification Store.

Phase 5: Scaling and Trade-offs (~15-20 minutes)

Deep Dive 1: Celebrity Fanout

The hard case is an author with millions of followers. A naive single job that loads all followers and sends all pushes will fail due to memory pressure, retry storms, and provider throttling.

Use a sharded campaign model:

python

if opted_in_follower_count < 100_000:
    enqueue_all_shards(post_id, priority="high")
else:
    create_campaign(post_id, shard_size=50_000, paced_dispatch=True)

For celebrity posts:

Split followers into deterministic buckets by (author_id, shard_id)
Pace shard execution so provider quotas are respected
Prioritize recently active followers first if the product allows it
Keep shard jobs idempotent so retries do not create duplicate pushes

Do not scan "all followers of a celebrity" in one database request. That creates a hot partition and gives you no recovery point. Shard the fanout work and checkpoint progress.

Deep Dive 2: Delivery Semantics

You cannot guarantee true exactly-once push delivery because the queue, workers, and providers all operate with at-least-once behavior. The practical design is:

Effectively-once logical notification creation via a unique key in the Notification Store
At-least-once dispatch attempts with idempotent provider requests where possible
Best-effort user-visible delivery because device state is outside your control

This is why the Notification row is the canonical follower-level record and NotificationAttempt rows are the per-device delivery log.

Deep Dive 3: Preference Filtering and Timing

A follower may:

Disable post notifications entirely
Mute only a specific author
Enter quiet hours in their own timezone
Unfollow or block the author after the post is created

The safest rule is to filter as late as possible, during shard expansion or just before dispatch. That reduces stale sends caused by race conditions between post creation and preference changes.

For quiet hours specifically, late filtering usually means defer instead of suppress: create the notification row, compute the next valid send time from the user's timezone, set scheduled_for, and let the delayed queue release it later.

Trade-off:

Filter once at event creation: cheaper, but stale
Filter late during fanout/dispatch: more reads, but more correct

For user trust, late filtering is usually worth the extra cost.

Deep Dive 4: Provider Failures and Token Hygiene

Push providers fail in different ways:

Temporary errors: retry with exponential backoff and jitter
Permanent errors: mark device token invalid and stop sending
Slow provider region: shift traffic if multi-region routing is available

Important practices:

Batch sends by provider and platform
Cap retry age so a "new post" push is not delivered hours later
Send invalid-token events back to Device Service for cleanup

Availability and Multi-Region

For high availability:

Run stateless API, orchestration, and dispatch workers in multiple regions
Keep the event bus replicated or use region-local queues with mirrored failover
Store notifications in a replicated database with regional failover
Avoid cross-region synchronous calls in the hot path; only the initial post write needs strong consistency

The follower graph and notification store can usually tolerate eventual consistency across regions. Missing a few milliseconds of replica lag is far better than slowing down the entire post path.

Common Pitfalls

Confusing provider acceptance with delivery. APNs or FCM returning success only means they accepted the message, not that the user saw it.

No transactional outbox. Writing the post to the database and publishing the event separately creates a classic lost-notification failure mode.

Ignoring hot authors. A design that works for 1,000 followers often collapses for 10 million followers unless fanout is sharded and paced.

No dedupe key. Retries in the queue or worker layer will create duplicate pushes without an idempotent notification record.

Filtering too early. If you only evaluate mute, block, and quiet-hour settings at post creation time, users can still receive notifications they turned off seconds later.

Interview Checklist

Before wrapping up, verify you covered:

Requirements Phase

Core scope is new-post push notifications, not the full feed system
Functional requirements include preferences, retries, and dedupe
Non-functional requirements include latency, correctness, and celebrity spikes
Quick capacity estimate shows fanout scale

Data Model

Post, FollowEdge, UserNotificationPreference, Device, Notification, NotificationAttempt, FanoutJob
Unique dedupe key explained
Follower-level state separated from per-device attempt history

API Design

REST APIs for posts, devices, preferences, and acknowledgements
Internal post_created event defined
Transactional outbox justified

High-Level Design

Architecture diagram with fanout and dispatch pipeline
Follower graph, preference model, and notification store explained
APNs / FCM integration covered

Scaling and Trade-offs

Celebrity fanout strategy explained
At-least-once vs exactly-once trade-off discussed
Provider retry behavior and invalid token cleanup covered
Multi-region availability and late preference filtering mentioned

Summary

Aspect	Recommendation	Rationale
Post trigger	Transactional outbox	Prevent lost notification events
Fanout strategy	Sharded jobs by `(author_id, shard_id)` bucket	Handles celebrity-scale writes safely
Follower store	Wide-column KV by `(author_id, shard_id)`	Efficient follower scans without hot partitions
Notification record	Canonical row with unique dedupe key	Prevents duplicate pushes
Delivery log	Separate attempt table	Tracks retries and provider outcomes
Dispatch layer	Batched workers with rate limiting	Respects APNs / FCM quotas
Preference handling	Global user prefs + per-author overrides, filtered late	Better correctness without rewriting wide follow edges
Reliability model	At-least-once attempts, idempotent creation	Practical and robust

The strongest answer here is not "use Kafka and push notifications." It is showing that you understand the real bottlenecks: fanout amplification, celebrity hot keys, provider throttling, idempotency, and user-preference correctness.