Home

How Discord Stores Trillions of Messages

By Mohamad Shybly
June 06, 2024
3 min read
How Discord Stores Trillions of Messages

Background

Discord is a popular communication app used widely in gaming and live streaming communities for chatting, audio calls, and receiving calls during YouTube podcasts. The platform has experienced rapid growth, storing trillions of messages, leading to significant challenges in database management. Initially, Discord transitioned from MongoDB to Cassandra for its scalable, fault-tolerant, and low-maintenance properties. However, recent developments required a shift from Cassandra to ScyllaDB to manage their vast and increasing data more efficiently.

Key Concepts and Terms

  1. MongoDB: A NoSQL database known for its scalability and flexibility, using JSON-like documents with dynamic schemas.
  2. Cassandra: A highly scalable, distributed NoSQL database designed to handle large amounts of data across many commodity servers without a single point of failure.
  3. ScyllaDB: A NoSQL database compatible with Cassandra but designed for higher performance with lower latency, leveraging C++ for its implementation.
  4. Primary Storage Engine: The main database system used to store and manage data.
  5. Hot Partition: A scenario where a specific partition of data receives a disproportionately high amount of traffic, leading to performance bottlenecks.
  6. LSM (Log-Structured Merge) Tree: A data structure that uses multiple levels of sorted runs for storing data, optimized for write-heavy workloads.
  7. Bloom Filters: A space-efficient probabilistic data structure used to test whether an element is a member of a set, reducing the need for expensive disk accesses.
  8. Compaction: The process of merging data from multiple files into a single file to reduce redundancy and improve read performance.
  9. Snowflake ID: A unique identifier that combines a timestamp with other attributes to ensure ordered uniqueness across distributed systems.
  10. GRPC: A high-performance, open-source universal remote procedure call (RPC) framework initially developed by Google.

Article: How Discord Stores Trillions of Messages

Introduction

Discord’s engineering team faced a significant challenge: how to efficiently store and manage trillions of messages. In 2017, they transitioned from MongoDB to Cassandra, seeking a more scalable solution. Recently, they moved to ScyllaDB to further enhance their system’s performance and manageability. This article explores the reasons behind these transitions, the problems encountered, and the solutions implemented.

Challenges with Cassandra

Discord’s transition to Cassandra aimed to leverage its scalability and fault-tolerance. Cassandra uses a distributed architecture, partitioning data across nodes and replicating it to ensure reliability. However, managing a rapidly growing dataset introduced several challenges:

  1. Hot Partitions: Cassandra’s performance was significantly impacted by hot partitions, where specific partitions received a high volume of concurrent reads and writes. This led to unpredictable latency and frequent on-call incidents for the engineering team.
  2. Garbage Collection: Cassandra’s Java-based implementation required tuning of the garbage collector to avoid long pause times, which affected system stability and latency.
  3. Compaction Issues: Compaction, essential for merging and optimizing data on disk, often fell behind due to the volume of writes, causing additional performance bottlenecks.

Migration to ScyllaDB

After experimenting with ScyllaDB, Discord observed several improvements in performance and reliability. ScyllaDB, written in C++, provided better workload isolation and eliminated garbage collection issues. The transition process involved:

  1. Dual Writing: During the migration, Discord wrote data to both Cassandra and ScyllaDB to ensure consistency.
  2. Data Services: Discord introduced intermediary data services written in Rust to coalesce requests, reducing the load on the database by grouping similar queries.
  3. Improved Storage Architecture: The new ScyllaDB cluster utilized local SSDs for speed and RAID configurations for durability, significantly enhancing write and read performance.

Benefits Realized

The migration to ScyllaDB and the introduction of data services yielded substantial benefits:

  1. Reduced Latency: Tail latency improved from 125ms to 15ms, providing a smoother user experience.
  2. Lower Maintenance: The elimination of garbage collection issues reduced the frequency of on-call incidents and system maintenance tasks.
  3. Scalability: ScyllaDB’s architecture allowed Discord to manage their growing dataset more efficiently, reducing the number of nodes required from 177 Cassandra nodes to 72 ScyllaDB nodes.

Conclusion

Discord’s journey from MongoDB to Cassandra and finally to ScyllaDB illustrates the challenges and solutions associated with managing large-scale distributed databases. By addressing issues such as hot partitions, garbage collection, and compaction, and by leveraging new technologies and architectures, Discord significantly improved their system’s performance and reliability. This transition not only ensured the efficient storage of trillions of messages but also enhanced the overall user experience on the platform.


Tags

#discord#nosql#database#scylladb#cassandra#mongodb

Share

Previous Article
Why transitioning from HTTP/1 to HTTP/2 could put your backend under stress for the same traffic

Mohamad Shybly

Software Engineer

Topics

Software Engineering

Related Posts

Why transitioning from HTTP/1 to HTTP/2 could put your backend under stress for the same traffic
June 06, 2024
1 min

Social Media

linkedin