Design Twitter (X): A System Design Interview Guide

Designing a system like Twitter (now X) is a classic system design interview question because it touches on almost every aspect of distributed systems: heavy read/write loads, eventual consistency, complex data modeling, and the famous "fan-out" problem.
In this guide, we will architect a platform capable of handling 500 million daily active users (DAU) and billions of tweets per day.
1. Requirements & Estimation
Before writing code or drawing boxes, we must define the scale.
Functional Requirements
- Post Tweet: Users can post short text/media messages.
- Home Timeline: Users can view a feed of tweets from people they follow.
- Follow/Unfollow: Users can follow others.
- Search: Users can search tweets by keywords (Out of scope for this article, focusing on Feed).
Capacity Estimation (Back-of-the-Envelope)
- DAU: 500 Million.
- Reads: Each user visits their timeline 5 times/day -> 2.5 Billion reads/day (~29k QPS).
- Writes: Users post 100 Million tweets/day (~1.2k QPS).
- Ratio: Read-heavy system (High Read-to-Write ratio).
Conclusion: We need a system optimized for fast reads.
2. High-Level Architecture
We will use a Microservices architecture to separate concerns.
Key Components
- User Service: Profile management, authentication.
- Tweet Service: Storage and retrieval of tweet content.
- Social Graph Service: Tracks who follows who.
- Timeline Service: The most complex piece. Generates and retrieves news feeds.
3. Data Modeling
Tweets & Scale (The "Snowflake" ID)
We cannot rely on a single database's auto-incrementing ID. We need a global, unique, sortable ID generator. Twitter uses Snowflake:
- 64-bit integer.
- Sorting by ID roughly equates to sorting by time.
- Distributed generation without coordination.
Database Choice
- User Data: MySQL/PostgreSQL. Relational data requires ACID compliance (ACID is less critical for tweets, but critical for user accounts).
- Tweet Data: Cassandra or DynamoDB. Why? Massive write throughput, simple key-value structure (TweetID -> Data), and linear horizontal scalability.
- Social Graph: Graph Database (Neo4j) or specific key-value store optimized for adjacency lists.
4. The Core Challenge: Timeline Generation
How do we efficiently show a user a feed of tweets from the 500 people they follow? There are three main approaches.
Approach A: Pull Model (Fan-out on Read)
When User A checks their feed:
- Fetch IDs of everyone User A follows.
- Fetch recent tweets for all those IDs (e.g.,
SELECT * FROM tweets WHERE user_id IN (...)). - Merge and sort in memory.
- Pros: Simple implementation. NRT (Near Real-Time).
- Cons: High Latency. If a user follows 2,000 people, the DB query is heavy. Doesn't scale for read-heavy systems.
Approach B: Push Model (Fan-out on Write)
When User B posts a tweet:
- Find all followers of User B.
- Insert the tweet ID into a cached timeline list (e.g., Redis List) for each follower.
- Pros: Zero Latency on Read. The timeline is pre-computed.
- Cons: Write Amplification. If Justin Bieber (100M followers) tweets, the system must perform 100M writes. This is the "Thundering Herd" problem.
Approach C: The Hybrid Strategy (Winner)
We combine the two.
- Normal Users: Use Push Model. Their tweet is pushed to their few hundred followers' caches immediately.
- Celebrities (VIPs): Use Pull Model. Justin Bieber's tweets are not pushed. Instead, when a user views their timeline, they pull Bieber's tweets separately and merge them.
5. Storage Optimization: Sharding
We must shard our databases. Sharding by User ID vs Tweet ID?
- Sharding by User ID: All tweets for a user live on one shard.
- Pro: Fast to fetch all tweets for one user.
- Con: "Hot Partition" problem (Celebrities).
- Sharding by Tweet ID: Tweets are distributed randomly (or by Snowflake time).
- Pro: Even distribution of load.
- Con: Fetching timeline requires querying all shards (Scatter-Gather).
Industry Standard: Often a mix, but time-based sharding (Snowflake) combined with secondary indexing helps.
6. Caching & Reliability
Redis Strategy
The timeline should be stored as a Redis List or Sorted Set.
- Key:
timeline:{user_id} - Value: List of
tweet_ids. - We only need to cache the last 800 tweets. Old tweets can be fetched from the DB on demand (scroll).
Replication
- Primary-Replica: Master accepts writes, Replicas serve reads.
- Quorum Writes (Cassandra): Ensure data is written to at least 2/3 nodes for durability.
7. Summary
Designing Twitter requires handling conflicting constraints. The Hybrid Fan-out architecture allows us to serve 99% of users with pre-computed (O(1)) timelines while preventing system collapse when celebrities tweet.
By leveraging Redis for hot timelines, Cassandra for massive write ingestion, and a Snowflake ID generator for distributed sorting, we build a system that is both resilient and lightning fast.