steve landiss
steve landiss

Reputation: 1913

Can cassandra guarantee the replication factor when a node is resyncing it's data?

Let's say I have a 3 node cluster.

I am writing to node #1.

If node #2 in that cluster goes down, and then comes back up and is resyncing the data from the other nodes, and I continue writing to node #1, will the data be synchronously replicated to node #2? That is, is the replication factor of that write honored synchronously or is it behind the queue post resync?

Thanks Steve

Upvotes: 2

Views: 619

Answers (2)

Aravind Chamakura
Aravind Chamakura

Reputation: 339

Node2 will immediately start taking the new writes and also any hints stored for this node by others. It is good idea to run a read repair on the node after it is back up, which will ensure the data is accurate with other nodes.

Note that each column has a timestamp stored against it which will help cassandra determine which data is recent when running node repair.

Upvotes: 0

Andy Tolbert
Andy Tolbert

Reputation: 11638

Yes granted that you are reading and writing at a consistency level that can handle 1 node becoming unavailable.

Consider the following scenario:

  1. You have a 3 node cluster with a keyspace 'ks' with a replication factor of 3.
  2. You are writing at a Consistency Level of 'QUORUM'
  3. You are reading at a Consistency level of 'QUORUM'.
  4. Node 2 goes down for 10 minutes.
  5. Reads and Writes can successfully continue while node is down since 'QUORUM' only requires 2 (3/2+1=2) nodes to be available. While Node 2 is down, both Node 1 and 3 maintain 'hints' for Node 2.
  6. Node 2 comes online. Node 1 and 3 send hints they recorded while Node 2 was down to Node 2.

If a read happens and the coordinating cassandra node detects that nodes are missing data/not consistent, it may execute a 'read repair'

If Node 2 was down for a long time, Node 1 and Node 3 may not retain all hints destined for it. In this case, an operator should consider running repairs on a scheduled basis.

Also note that when doing reads, if Cassandra finds that there is a data mismatch during a digest request, it will always consider the data with the newest timestamp as the right one (see 'Why cassandra doesn't need vector clocks').

Upvotes: 3

Related Questions