How to Manage Node Failure with Cassandra Replication Factor 1?

Question

I have a three node Cassandra (DSE) cluster where I don't care about data loss so I've set my RF to 1. I was wondering how Cassandra would respond to read/write requests if a node goes down (I have CL=ALL in my requests right now).

Ideally, I'd like these requests to succeed if the data exists - just on the remaining available nodes till I replace the dead node. This keyspace is essentially a really huge cache; I can replace any of the data in the event of a loss.

TomerSan · Accepted Answer

(Disclaimer: I'm a ScyllaDB employee)

Assuming your partition key was unique enough, when using RF=1 each of your 3 nodes contains 1/3 of your data. BTW, in this case CL=ONE/ALL is basically the same as there's only 1 replica for your data and no High Availability (HA).

Requests for "existing" data from the 2 up nodes will succeed. Still, when one of the 3 nodes is down a 1/3 of your client requests (for the existing data) will not succeed, as basically 1/3 of you data is not available, until the down node comes up (note that nodetool repair is irrelevant when using RF=1), so I guess restore from snapshot (if you have one available) is the only option.

While the node is down, once you execute nodetool decommission, the token ranges will re-distribute between the 2 up nodes, but that will apply only for new writes and reads.

You can read more about the ring architecture here: http://docs.scylladb.com/architecture/ringarchitecture/

How to Manage Node Failure with Cassandra Replication Factor 1?

Answers (1)

Related Questions