Design Discord: Scaling Real-Time Chat to Billions

Designing Discord presents a different set of challenges compared to Twitter or Facebook. While social networks focus on the "Feed" (asynchronous, algorithmic), Discord focuses on Presence and Real-Time Communication (synchronous, chronological).
In this deep dive, we will explore how to build a system that supports 15+ million concurrent users and processes billions of messages per day with single-digit millisecond latency.
1. Requirements & Scale
Functional Requirements
- Servers (Guilds) & Channels: Users can create persistent chat rooms.
- Real-Time Chat: Messages must appear instantly for all online members in a channel.
- Presence: Users need to see who is Online, Idle, or Playing a game.
- Voice/Video: Low-latency voice chat (out of scope for this text-focused article, but we'll touch on the signaling).
Constraints & Capacity
- Concurrent Users: 15 Million online at once.
- Message Volume: 1 Billion messages/day.
- Read/Write Ratio: Extremely high write volume compared to Twitter. People chat constantly.
- Latency: Critical. < 100ms for message delivery.
2. High-Level Architecture
Discord is a hybrid system. It uses REST APIs for actions (posting a message, changing settings) and WebSockets for delivering events (new message received, user joined).
The Gateway Service (Stateful)
Unlike standard REST APIs, the Gateway needs to maintain persistent TCP/WebSocket connections.
- When a user comes online, they connect to a specific Gateway node.
- This connection is long-lived.
- Challenge: If a user sends a message to a Guild, how do we know which Gateway nodes hold the connections for the other 1000 members of that Guild?
3. The "Guild Affinity" Problem
In a platform like Slack or Discord, activity is scoped to a "Guild" (Server). It is inefficient to broadcast a message to the entire world. We only need to broadcast to members of that Guild.
Solution: Consistent Hashing Ring
We route all users and events for a specific Guild to a specific set of backend nodes.
- Gateway Routing: When a user connects, we don't just pick a random node. We want efficient routing for presence updates.
- Guild Service: Keeps the state of "Who is in Guild A?".
However, storing the session state of 15M users is hard. Discord solves this by utilizing Consistent Hashing to distribute the load of Guilds across Gateway nodes.
Note: In reality, Discord has a "Session" system where your WS connection might live on Node X, but Node X subscribes to updates for the Guilds you are part of.
4. Data Storage: The Migration to ScyllaDB
Discord famously migrated from MongoDB to Cassandra and finally to ScyllaDB (a C++ rewrite of Cassandra). Why?
The Problem with MongoDB
- Data was stored in "proliferated" documents (chunking messages into groups of 50).
- Random reads (jumping to an old message) became slow.
- Garbage Collection (GC) pauses in Java/Go (if not tuned) or MongoDB's locking caused latency spikes.
The ScyllaDB Model
Messages are immutable and time-series data.
- Partition Key:
channel_id(All messages for a channel live together). - Clustering Key:
snowflake_id(Sort messages by time).
This allows extremely fast lookups for "Give me the last 50 messages in Channel X".
- Bucket: To avoid "Hot Partitions" if a channel has 1B messages, we bucket them by time (e.g., 10 days per bucket).
5. Handling "The Crown" (Viral Events)
What happens when a YouTuber creates a server and 500,000 people join instantly? Or a "Raid" happens?
Fan-Out Optimization
If we simply iterate for member in guild: send_ws_event(), the loop takes too long for 500k members.
Optimization:
- Worker Pools: The task is pushed to a queue.
- Sharded Websockets: The Guild's connections are sharded across multiple machines. The event is sent to the "Guild Manager", which forwards it to the 50 machines holding the actual sockets.
6. Real-Time Presence (Who is Online?)
Updating "User is typing..." or "User is playing Elden Ring" is heavy write traffic.
- Do not write to DB: Presence is ephemeral. If the server crashes, it doesn't matter if we lose the "typing" status.
- Use Ephemeral State (Redis/Erlang ETS): Store presence in a distributed in-memory grid.
- Optimization: smart throttling. Do not broadcast "typing" more than once every 5 seconds per user.
7. Voice & Video (Signaling vs Media)
When you join a voice channel, you aren't sending audio over the WebSocket.
- Signaling (WebSocket): "I want to join Room A."
- Authentication: Server grants a token and an IP:Port of a Selective Forwarding Unit (SFU).
- Media (UDP): The client connects to the SFU via UDP (WebRTC). The SFU forwards your audio packets to other participants.
Summary
Discord is a study in latency minimization.
- Databases: ScyllaDB for massive write throughput and range queries.
- Networking: WebSockets for control, UDP for media.
- Architecture: Consistent hashing for stateful guild management.
By treating chat not just as data storage but as a live event stream, Discord achieved the scale we see today.