Reputation: 334
I am reading DDIA. It says "possible to make Dynamo-style quorums linearizable at the cost of reduced performance: a reader must perform read repair (see “Read repair and antientropy” on page 178) synchronously, before returning results to the application [23], and a writer must read the latest state of a quorum of nodes before sending its writes."
What benefits are in requiring "a writer must read the latest state of a quorum of nodes before sending its writes?"
Read repair would fix the example in figure 1, and I can't come up with a reason or an example where a writer should perform a bunch of reads before doing the write, especially assuming that it's unconditional.
Upvotes: 3
Views: 274
Reputation: 1405
author here. Sorry that this is not written very clearly. We'll revise this explanation in the upcoming second edition.
The reason a writer must first read from a quorum is because it needs to get the highest timestamp (aka version/tag) of any value that has been written to a quorum of replicas. This is important when there are multiple clients making writes. If client A makes a write, and then client B overwrites the value written by A, it's important that B's write has a higher timestamp since otherwise it won't overwrite correctly. If B doesn't first make a read, it won't know A's timestamp, and therefore could not guarantee that B's timestamp would be higher.
For details, see Section 4 of the following paper:
Nancy Lynch and Alex Shvartsman. Robust Emulation of Shared Memory Using Dynamic Quorum-Acknowledged Broadcasts. 27th Annual International Symposium on Fault-Tolerant Computing (FTCS), June 1997.
The answer by itisravi is unfortunately not correct. The write of x=1 has not failed, it's just incomplete by the time that Reader A reads it. Holding off writes is not the answer, because you can never know whether the time is now right to go ahead with the write. Moreover, even if a write fails (i.e. it does not manage to get acknowledgements from a quorum within some timeout), clients may or may not read the partially written value; the failure of a write does not guarantee that the write didn't take effect.
Upvotes: 1
Reputation: 3571
The way I understand it, the author is not saying that the writers need to read the value but that they should check the latest state of the cluster's write quorum (i.e. can the writer see all 'w' nodes) and should hold off writing if it is not met. Otherwise, you can end up reading stale data of a write that was never deemed successful in the first place, even if the subsequent read was a 'read-repair', thereby violating linearizability.
In the quoted example (n=3, w=3, r=2), say the writer was only connected to Replica-1 (and not all 3 of them) and yet initiated a write of x=1, which of course went on to be considered a failure (since replicas 2 and 3 did not get updated). Now if the Reader A and Reader B did the read-repair as per the diagram, using latest-write-wins, all 3 replicas will have x=1. This is incorrect because readers must have gotten x=0 since the earlier write was deemed a failure.
Note that the readers A and B could have begun after we get the response for the write and we would still end up in the same situation. It is also possible that you can end up in this state even if the writer can talk to all 3 replicas but say the write failed on 2 of them due to disk errors. So checking the quorum before write is only a partial requirement.
In summary, any leaderless replication just relying on quorum based consistency is not 100% linearizable.
Upvotes: 0