Jason Mantra
Jason Mantra

Reputation: 105

Why would the coordinator store hints about dead replicas if a replica node for the row is known to be down ahead of time?

According to online docs on datastax (https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_about_hh_c.html), the following is stated:

During a write operation, when hinted handoff is enabled and consistency can be met, the coordinator stores a hint about dead replicas in the local system.hints table under either of these conditions:

What I'm confused is why is the first bullet point a condition why the coordinator would store hints if they already know ahead of time it's down.

According to the docs here(https://www.datastax.com/dev/blog/how-cassandra-deals-with-replica-failure), it states that: the only time Cassandra will fail a write is when too few replicas are alive when the coordinator receives the request.

From what I've read so far, hints are utilized only when the required replicas are alive at the time of receiving the request and one or more of the replicas become unresponsive. The first bullet point says that hints are used when a replica node is down already. If Cassandra will just automatically fail a write when too few replicas are alive, what's the point of storing hints for a write that has already been deemed a failure?

Upvotes: 0

Views: 212

Answers (2)

Simon Fontana Oscarsson
Simon Fontana Oscarsson

Reputation: 2124

In Cassandra there is a thing called consistency level (CL) that is set for every request. The CL defines how many nodes have to respond to a request for it to be successful.

For example, say you have replication of 3. You don't necessarily need all 3 nodes in order to know that you get a correct response. The majority (called quorum) is enough. Don't misinterpret this though. If the request is a write, you still write to all 3 replicas since you want data to be replicated on all nodes. But since you specified CL quorum only the majority, 2, nodes have to respond back to the coordinator for the client request to be a success.

So the full write path for a client request with CL quorum could be the following:

                                      --write--> Node 1
Client --write request--> Coordinator --write--> Node 2
                                      --write--> Node 3

                                <--success-- Node 1
Client <--success-- Coordinator <--success-- Node 2
                                <--success-- Node 3

This is in case all 3 nodes respond with a success to the coordinator.

We could also get a timeout from one of the nodes:

                                      --write--> Node 1
Client --write request--> Coordinator --write--> Node 2
                                      --write--> Node 3

                                <--success-- Node 1
Client <--success-- Coordinator <--success-- Node 2
                                <--timeout-- Node 3

Since we only need success from a majority of nodes the request will still be successful. In this case a hint will be stored on the coordinator. It will try to replay (send the write again) once Node 3 is responsive again.

In the case that the node is already known to be down or unresponsive the same thing will happen. The coordinator simply cares if enough nodes responds in order to satisfy the consistency level.

Upvotes: 0

MarcintheCloud
MarcintheCloud

Reputation: 1653

In the Cassandra world, there's usually an expectation that a node left in the "down" state will eventually come back up. Otherwise, you as the operator, would decommission it.

Cassandra doesn't know if that node will come back up or not so it stores that hint in case the node does come up before the hint window expires. It's faster to recover from a hint than Cassandra's repair process (which should always be running anyways).

Upvotes: 0

Related Questions