How should I set the replication factor in Cassandra to account for node failure?

Question

Lets say we have a cassandra deployment with a replication factor of 2. By this I mean that we can tolerate the total loss of one node of persistent storage without overall data loss. I understand this to mean that each of the values are stored on at least two different nodes at any given time. Therefore the total storage required is at least the total data of the values x 2. Ie, if we need to store 100TB in the cluster, we would need at least 200TB persistent storage across the nodes.

However, as the node count increases, so does the likelyhood of more than 1 node failing. Therefore, do we need to increase the replication factor as the number of nodes increases?

For example:

Lets assume that all components are 100% reliable, except for my nodes local storage controllers, which for time to time completely corrupt all local storage with no possibility for restoration (ie, data loss is total). All rack equipment, switches, power, cooling etc are all perfect. I know this is not realistic.

Lets also assume that any data loss is really, really bad for this application.

Lets say my nodes have 1TB each of storage. For 100TB of values, I would need 200 machines to achieve a replication factor of 2 (ie, I can lose any one node and still retain data). However, if I believe that the simultaneous failure of 2 nodes in that set of 200 is likely I will need to raise the replication factor to 3. Therefore now I need three copies of each value (on three different nodes) and now I need 300 nodes. I now feel that the simultaneous loss of 3 or more nodes is likely, so I have to add more nodes again, etc...

Surely this isn't actually how this scales? What is wrong with my logic?

Alex Ott · Accepted Answer

There are several types of failures that you need to take into account:

Individual node failure (hardware/os/...) - your node is failed, either completely (data is lost), or partially (for example, power adapter has failed)
Rack/data center failure - when nodes in specific part of data center, or data center completely failed, or not available over network

Replication helps to avoid complete data unavailability, but it may also depend on the deployment strategy.

For example, if all your servers in one data center, if it's not available, you'll lose access to the data. Or if you didn't setup cluster to have rack-aware data placement, replicas could be put into the same rack, and if it's going down, you lose your replica.

Typically, it's recommended to use replication factor 3, and if you're planning big deployment, definitely use rack-aware data placement - but you should be careful, so number of racks should match RF (in cloud deployments, usually the rack is mapped to the availability zone).

Availability is also depends on your business requirements - in simplest case, if you use consistency levels ONE or LOCAL_ONE, your data is available even only one replica is available, but if your business logic requires stronger consistency, you need to have more replicas available. And replication factor also affects the consistency levels - if you use RF=2, and require CL=QUORUM, you can't tolerate single node failure, while it's possible to achieve that CL with RF=3 and one node failed.

How should I set the replication factor in Cassandra to account for node failure?

Answers (1)

Related Questions