System Design: CAP Theorem & Availability

≥ Data can be either consistent or highly available!

After learning about the significance of CDN and DNS in System Design in the last release, let's learn the role of Consistency and Availability in the world of System Design. Let's start with availability and consistency first, then we'll move to the CAP Theorem.

Availability

Availability refers to the ability of a system to provide its services to clients even in the presence of failures. This is often measured in terms of the percentage of time that the system is up and running, also known as its uptime.

Availability Patterns

Availability is measured as a percentage of uptime and defines the proportion of time that a system is functional and working. Availability is affected by system errors, infrastructure problems, malicious attacks, and system load. Cloud applications typically provide users with a service level agreement (SLA), which means that applications must be designed and implemented to maximize availability.

Calculate SLA & Uptime from here: https://uptime.is/

Consistency

Consistency, on the other hand, refers to the property that all clients see the same data at the same time. This is important for maintaining the integrity of the data stored in the system.

There are three main types of consistency patterns:

  • Strong consistency

  • Weak consistency

  • Eventual Consistency

Each of these patterns has its advantages and disadvantages, and the choice of which pattern to use will depend on the specific requirements of the application or system.

Read more: Consistency Patterns in Distributed Systems (cs.fyi)

In distributed systems, it is often a trade-off between availability and consistency. Systems that prioritize high availability may sacrifice consistency, while systems that prioritize consistency may sacrifice availability. Different distributed systems use different approaches to balance the trade-off between availability and consistency, such as using replication or consensus algorithms.

Partition Tolerance

Imagine you're online banking, checking your account balance. This information is stored on multiple servers across the country. Now, picture a sudden power outage affecting some servers (a network partition) across the country.

Partition tolerance is like having a safety net for banks in such situations. Let's try to understand it like this:

  • Partition: This happens when the communication between some servers gets cut off, similar to the power outage affecting parts of the system. This will disrupt the services provided by the banks for some time.

  • Tolerance: When the bank's systems can handle this disruption and keep functioning, at least partially. Even with the partition, you might still be able to log in and see your past transactions, and also make the transactions as usual.

Why is this crucial in banking? Two main reasons:

  • Accuracy is Important (Number Game): Banks prioritize consistency. They need to ensure everyone, from you to the bank staff, sees the same information about your account balance at all times. This prevents any confusion or potential errors, like seeing a different balance than what exists. If a partition occurs, the system might temporarily restrict some functions, like fund transfers, until communication is restored across all servers (like waiting for the power to come back on everywhere). This ensures data integrity is maintained.

  • Keeping things running (Work should never stop): While consistency is essential, banks also strive for availability. They want their systems to be accessible even during minor disruptions. This means you might still be able to access basic information like account history or contact customer service, even if certain functions like transfers are unavailable during a partition.

The choice between prioritizing consistency or availability depends on the specific situation. But partition tolerance ensures the bank's system can handle these network hiccups, keeping your information secure and the system running smoothly, even when the lights flicker momentarily. So, you can relax knowing your online banking experience is protected!

CAP Theorem Explained.

The CAP theorem is kind of like a rule of thumb for building large, online systems, especially those that store and manage data. Imagine you're running a massive online store with millions of users, all wanting to see product details, add items to carts, and make purchases. The CAP theorem helps you decide how to design this system to handle these requests effectively.

CAP stands for Consistency, Availability, and Partition Tolerance. These are three crucial qualities you want in your online store system or any distributed system.

System designers use the CAP theorem to decide which two qualities are most important for their specific system. For an online store, both consistency and availability are crucial. The choice depends on the specific needs:

  • If ensuring everyone sees the exact stock level is paramount, consistency might be prioritized, even if it means occasional hiccups in availability.

  • If keeping the store always accessible for browsing and basic functions is more important, availability might be prioritized, with some tolerance for temporary inconsistencies.

However, The CAP theorem doesn't tell you exactly what to do, but it helps designers understand the trade-offs involved and make informed decisions when building large, online systems. It's like having a guide to navigate the challenges of keeping your online store running smoothly for all your customers!

Read More: A plain English introduction to CAP theorem « Kaushik Sathupadi

Conclusion

The CAP theorem isn't a limitation, but a valuable guide for building robust and efficient distributed systems. It reminds us that we can't have the best of all worlds; we must choose the two most critical properties for our specific system – consistency or availability, and design accordingly with partition tolerance in mind.

By understanding these trade-offs, we can ensure our systems meet the needs of our users and operate smoothly even in the face of network hiccups. Remember, the CAP theorem empowers us to make informed decisions for building reliable and scalable systems in the ever-evolving world of technology.

Did you find this article valuable?

Support Aryan Chaurasia by becoming a sponsor. Any amount is appreciated!