Ivan
Ivan

Reputation: 462

What will happen if write failed in cassandra cluster when using QUORUM CL?

Suppose i have 3 nodes, the RF is 3, and using QUORUM CL. When i write a data record to the cluster, if one node succeed, one failed. So the whole write request is failed, what will happen to the succeed node? Will it be roll back automatically? or it will be propagated to other node via gossip. And finally the 3 nodes will all have the record even the original request was failed?

Upvotes: 8

Views: 4178

Answers (2)

mbaryu
mbaryu

Reputation: 189

shutty's answer is wrong in subtle ways though the article referred to is correct and an excellent source. The first three points appear correct:

  • Query coordinator will try to persist your write on all nodes according to RF=3. If 2 of them has failed, the CL=QUORUM write considered as failed.
  • A single node which accepted the failed write will not rollback it. It will persist it on memtable/disk as nothing suspicious happened.
  • Cassandra is an eventually consistent database, so it's absolutely fine for it to be in an inconsistent state for some period of time, but converging to consistent state in some future.

However the last two appear wrong and here's the corrected version:

  • Next time you read (CL=QUORUM) the key you previously failed to write, if there's still not enough nodes online, you'll get failed read. If the two nodes that failed to write previously are online (and not the one that succeeded the write) you'll receive previous value, unaffected by the failed write.
  • If the node that succeeded in writing also is online, a QUORUM read will result in read-repair causing the nodes that failed to write the newer value to update to the new value, then it will be returned. (Note: the word 'newer' is in the timestamp sense, so it is possible that even though the data was written more recently that it has an older timestamp -> this cluster started in an inconsistent state.)

Upvotes: 17

shutty
shutty

Reputation: 3338

There's an article about it. TL&DR version:

  • Query coordinator will try to persist your write on all nodes according to RF=3. If 2 of them has failed, the CL=QUORUM write considered as failed.
  • A single node which accepted the failed write will not rollback it. It will persist it on memtable/disk as nothing suspicious happened.
  • Cassandra is an eventually consistent database, so it's absolutely fine for it to be in an inconsistent state for some period of time, but converging to consistent state in some future.
  • Next time you read (CL=QUORUM) the key you previously failed to write, if there's still not enough nodes online, you'll get failed read. If other 2 nodes will come back to life, they will have read quorum (even if the third node data differs for that key) and you'll receive previous value, unaffected by the failed write.
  • If Cassandra detects such a conflict for a single key, it performs read repair process, when conflicting minority nodes data will be overwritten by the data from quorum's majority. So your node, which accepted failed write, will self-heal inconsistent row on next successful quorum read.

Upvotes: 1

Related Questions