2024-05-10

CAP Theorem!

The CAP theorem, also known as Brewer’s theorem, is a fundamental principle in distributed systems that states it is impossible for a distributed data store to simultaneously provide all three of the following guarantees:

Consistency (C): Every read from the database returns the most recent write or an error. This means that all nodes in the system have the same data at the same time.
Availability (A): Every request (read or write) receives a response, even if some of the nodes are down. The system is always available for reads and writes.
Partition Tolerance (P): The system continues to operate despite arbitrary partitioning due to network failures. This means the system can handle communication breakdowns between nodes.

Partition Tolerance

What is a partition here? Well its what I initially thought off, its not disk or database partitions - but the English term PARTITION (separation) is what is that we are talking about here in CAP theory. Its the moment when a network failure/delay caused nodes in a distributed system to get separated ( part away)/ disconnect and become isolated, leading to a partition. Partition Tolerance, ensure that the system can operate even with some nodes being not able to communicate with each other due to network partitions. The system might not work perfectly, but it doesn’t completely shutdown. Example ? In a distributed database, that spans across multiple data centres, lets imagine a scenario where, one of the DC gets partitioned from the other due to some network outage. With partition tolerance the database on each data centres can function well, independently although they are not in sync. Why we can attain Consistency and Availability in a system that can tolerate partitions? Consistency: During partition, individual systems data are not in sync and there could be an issue with duplicates or determining how the valid data should look like would be difficult. Availability: If we have to keep a system consistent during a partitioned scenario, we would have to limit the availability of some part of the system to prevent data corruption.

Explanation

Consistency ensures that all nodes see the same data at the same time. For instance, if you write a piece of data to one node, all subsequent reads from any node should return that piece of data.
Availability ensures that every request gets a response (success or failure), without guaranteeing that it contains the most recent write. The system remains operational even if some of its components fail.
Partition Tolerance ensures the system continues to function even when network partitions occur. A network partition is a communication break between nodes in a distributed system.

Trade-offs

The CAP theorem asserts that in the presence of a network partition (P), a distributed system can choose to be either:

Consistent but not Available (CA): The system will not respond to requests until the partition is resolved to maintain consistency.
Available but not Consistent (AP): The system will respond to requests but may return stale or inconsistent data during the partition.

Practical Implications

CA (Consistency and Availability): These systems work well when partitions are rare, but they can’t maintain both consistency and availability during a partition. Example: traditional relational databases in a single-node setup.
CP (Consistency and Partition Tolerance): These systems ensure consistency even during partitions, but may not be available during such times. Example: HBase, MongoDB in strong consistency mode.
AP (Availability and Partition Tolerance): These systems ensure availability even during partitions, but might not be consistent. Example: DynamoDB, Cassandra.

Examples

AP Example: In Amazon’s DynamoDB, the system remains available during partitions but may return eventually consistent data.
CP Example: In Google’s Bigtable, the system maintains consistency during partitions but may become unavailable until the partition is resolved.
CA Example: Relational databases like PostgreSQL or MySQL in a single-node setup can provide consistency and availability but can’t handle network partitions.

Conclusion

The CAP theorem highlights the trade-offs involved in designing distributed systems. Understanding these trade-offs helps in making informed decisions about which properties to prioritize based on specific application requirements and operational environments. In practice, many systems aim for a balance, often providing “eventual consistency” to achieve high availability and partition tolerance.