2.2.4 | ugeco

Okay, let's discuss Database Scaling. As systems grow, a single database server often becomes a bottleneck due to performance limitations or becomes a single point of failure. We need techniques to scale the database layer horizontally. The two primary techniques are Replication and Sharding (Partitioning).

1. Database Replication

Definition: Replication involves creating and maintaining multiple copies (replicas) of the same database on different servers.
Primary Goals:
- High Availability / Fault Tolerance: If one database server fails, replicas can take over, ensuring the system remains operational.
- Read Scalability: Read requests can be distributed across multiple replicas, increasing read throughput.
Common Replication Models:
- Master-Slave Replication:
  - How it works: One server acts as the master (primary), handling all write operations. Data changes on the master are asynchronously or synchronously replicated to one or more slave (secondary/replica) servers. Slaves typically handle read operations.
  - Pros: Improves read performance significantly, provides failover capability (a slave can be promoted to master if the original master fails).
  - Cons: Write operations are still limited by the single master. There can be replication lag (slaves might be slightly behind the master, especially with asynchronous replication), leading to potentially stale reads. Promoting a slave can be complex.
  - Synchronous vs. Asynchronous:
    - Synchronous: Master waits for confirmation from at least one slave before confirming the write to the client. Ensures higher consistency but increases write latency and can impact availability if slaves are slow/down.
    - Asynchronous: Master sends writes to slaves but doesn't wait for confirmation. Lower write latency and higher availability, but increased risk of data loss during failover if replicas haven't received the latest writes (replication lag).
- Master-Master Replication:
  - How it works: Two or more servers act as masters, accepting both read and write operations. Changes on one master are replicated to the other master(s).
  - Pros: Improved write availability (writes can go to any master), allows load balancing of writes.
  - Cons: Much more complex to manage. Conflict resolution is a major challenge (what happens if the same data is updated differently on two masters simultaneously?). Potential for data inconsistencies if not handled carefully.

2. Database Sharding (Partitioning)

Definition: Sharding involves splitting a large database horizontally into smaller, more manageable pieces called shards. Each shard contains a subset of the data and is typically hosted on a separate database server. Unlike replication (which copies all data), sharding divides the data.
Primary Goal:
- Write Scalability: Distributes write load across multiple servers.
- Handling Massive Datasets: Allows storing datasets too large to fit on a single server.
Core Concept: Sharding Key: Data is partitioned based on a sharding key (a specific column or set of columns in the data). The choice of sharding key is crucial.
Common Sharding Strategies:
- Range-Based Sharding:
  - How it works: Data is sharded based on a range of values in the sharding key (e.g., User IDs 1-1000 on Shard 1, 1001-2000 on Shard 2).
  - Pros: Relatively simple to implement. Efficient for range queries (e.g., find all users with IDs between 500 and 1500).
  - Cons: Can lead to hotspots (uneven data distribution) if data isn't uniformly distributed across ranges (e.g., more users signing up recently might overload the latest shard).
- Hash-Based Sharding (Consistent Hashing is often used):
  - How it works: A hash function is applied to the sharding key, and the result determines which shard the data belongs to.
  - Pros: Generally leads to a more uniform data distribution across shards, reducing hotspots.
  - Cons: Range queries become inefficient as they typically require querying all shards. The mapping of key to shard can be less intuitive.
- Directory-Based Sharding:
  - How it works: A separate lookup service (or table) maintains a mapping between sharding keys and the shard location.
  - Pros: Offers the most flexibility in mapping keys to shards.
  - Cons: The lookup service itself can become a performance bottleneck or a single point of failure. Adds latency due to the extra lookup step.
Challenges of Sharding:
- Increased Complexity: Sharding significantly increases the complexity of the system (deployment, management, monitoring).
- Cross-Shard Queries: Queries that need data from multiple shards (e.g., JOINs across shards) become very complex and often inefficient. Requires careful data modeling or application-level aggregation.
- Rebalancing: Adding or removing shards requires redistributing data, which can be a complex and resource-intensive operation (re-sharding).
- Schema Changes: Applying schema changes across all shards can be challenging.
- Hotspots: Even with good strategies, hotspots can still occur depending on data access patterns.

Combining Replication and Sharding:

Large-scale systems often use both replication and sharding. Each shard is typically replicated (e.g., using master-slave) to ensure high availability and read scalability within that shard.

In an Interview:

Be ready to discuss when and why you would use replication (HA, read scaling) versus sharding (write scaling, massive datasets). Explain the different models/strategies and their associated trade-offs (consistency vs. availability, complexity, hotspots). Mentioning the challenges of sharding demonstrates a deeper understanding.