rishabh mittal
rishabh mittal

Reputation: 71

How raft algorithm maintains strong read consistency in case of write failure followed by a node failure

Consider three nodes(A,B,C) getting key/value data. And the following steps happened

  1. Node A receive key:value (1:15). It is a leader
  2. It replicate to node B and node C
  3. Entry made to node B in pre commit log
  4. Node C fail the entry
  5. Ack from node B is lost.
  6. Nod A fail the entry and sent failure to client
  7. Node A is still leader and B is not in quorum
  8. Client read from node A for key 1 and it returned old value.
  9. node A is down
  10. Node B and node C is up
  11. now node B has an entry in precommit log and node C doesn't.

How does log matching happen at this time. Is node Bgoing to commit that entry or going to discard it. If it is going to commit thenit would be read inconsistent or if it is going to discard then there could be data loss in other cases

Upvotes: 2

Views: 2491

Answers (1)

rystsov
rystsov

Reputation: 1928

The error is in step 8. Every read operation must be replicated to other nodes otherwise you risk getting stale data, the system should serve read after it writes a dummy value to the log. In your case (B is offline), the "read" must affect nodes A and C, so when node B comes back online and A dies, C would be able to invalidate B's records.

This is a tricky problem and even Etcd run into it in the past (now it's fixed).

Upvotes: 7

Related Questions