Design Twitter (X): A System Design Interview Guide

Designing a system like Twitter (now X) is a classic system design interview question because it touches on almost every aspect of distributed systems: heavy read/write loads, eventual consistency, complex data modeling, and the famous "fan-out" problem.

In this guide, we will architect a platform capable of handling 500 million daily active users (DAU) and billions of tweets per day.

1. Requirements & Estimation

Before writing code or drawing boxes, we must define the scale.

Functional Requirements

Post Tweet: Users can post short text/media messages.
Home Timeline: Users can view a feed of tweets from people they follow.
Follow/Unfollow: Users can follow others.
Search: Users can search tweets by keywords (Out of scope for this article, focusing on Feed).

Capacity Estimation (Back-of-the-Envelope)

DAU: 500 Million.
Reads: Each user visits their timeline 5 times/day -> 2.5 Billion reads/day (~29k QPS).
Writes: Users post 100 Million tweets/day (~1.2k QPS).
Ratio: Read-heavy system (High Read-to-Write ratio).

Conclusion: We need a system optimized for fast reads.

2. High-Level Architecture

We will use a Microservices architecture to separate concerns.

Key Components

User Service: Profile management, authentication.
Tweet Service: Storage and retrieval of tweet content.
Social Graph Service: Tracks who follows who.
Timeline Service: The most complex piece. Generates and retrieves news feeds.

3. Data Modeling

Tweets & Scale (The "Snowflake" ID)

We cannot rely on a single database's auto-incrementing ID. We need a global, unique, sortable ID generator. Twitter uses Snowflake:

64-bit integer.
Sorting by ID roughly equates to sorting by time.
Distributed generation without coordination.

Database Choice

User Data: MySQL/PostgreSQL. Relational data requires ACID compliance (ACID is less critical for tweets, but critical for user accounts).
Tweet Data: Cassandra or DynamoDB. Why? Massive write throughput, simple key-value structure (TweetID -> Data), and linear horizontal scalability.
Social Graph: Graph Database (Neo4j) or specific key-value store optimized for adjacency lists.

4. The Core Challenge: Timeline Generation

How do we efficiently show a user a feed of tweets from the 500 people they follow? There are three main approaches.

Approach A: Pull Model (Fan-out on Read)

When User A checks their feed:

Fetch IDs of everyone User A follows.
Fetch recent tweets for all those IDs (e.g., SELECT * FROM tweets WHERE user_id IN (...)).
Merge and sort in memory.

Pros: Simple implementation. NRT (Near Real-Time).
Cons: High Latency. If a user follows 2,000 people, the DB query is heavy. Doesn't scale for read-heavy systems.

Approach B: Push Model (Fan-out on Write)

When User B posts a tweet:

Find all followers of User B.
Insert the tweet ID into a cached timeline list (e.g., Redis List) for each follower.

Pros: Zero Latency on Read. The timeline is pre-computed.
Cons: Write Amplification. If Justin Bieber (100M followers) tweets, the system must perform 100M writes. This is the "Thundering Herd" problem.

Approach C: The Hybrid Strategy (Winner)

We combine the two.

Normal Users: Use Push Model. Their tweet is pushed to their few hundred followers' caches immediately.
Celebrities (VIPs): Use Pull Model. Justin Bieber's tweets are not pushed. Instead, when a user views their timeline, they pull Bieber's tweets separately and merge them.

5. Storage Optimization: Sharding

We must shard our databases. Sharding by User ID vs Tweet ID?

Sharding by User ID: All tweets for a user live on one shard.
- Pro: Fast to fetch all tweets for one user.
- Con: "Hot Partition" problem (Celebrities).
Sharding by Tweet ID: Tweets are distributed randomly (or by Snowflake time).
- Pro: Even distribution of load.
- Con: Fetching timeline requires querying all shards (Scatter-Gather).

Industry Standard: Often a mix, but time-based sharding (Snowflake) combined with secondary indexing helps.

6. Caching & Reliability

Redis Strategy

The timeline should be stored as a Redis List or Sorted Set.

Key: timeline:{user_id}
Value: List of tweet_ids.
We only need to cache the last 800 tweets. Old tweets can be fetched from the DB on demand (scroll).

Replication

Primary-Replica: Master accepts writes, Replicas serve reads.
Quorum Writes (Cassandra): Ensure data is written to at least 2/3 nodes for durability.

7. Summary

Designing Twitter requires handling conflicting constraints. The Hybrid Fan-out architecture allows us to serve 99% of users with pre-computed (O(1)) timelines while preventing system collapse when celebrities tweet.

By leveraging Redis for hot timelines, Cassandra for massive write ingestion, and a Snowflake ID generator for distributed sorting, we build a system that is both resilient and lightning fast.