Howard
Howard

Reputation: 19825

AWS DynamoDB read after write consistency - how does it work theoretically?

Most of the nosql solution only use eventually consistency, and given that DynamoDB replicate the data into three datacenter, how does read after write consistency is being maintained?

What would be generic approach to this kind of problem? I think it is interesting since even in MySQL replication data is replicated asynchronously.

Upvotes: 5

Views: 8791

Answers (3)

NoSQLKnowHow
NoSQLKnowHow

Reputation: 4865

I'll tell you exactly how DynamoDB does this. No guessing.

In order for a write request to be acknowledged to the client, the write must be durable on two of the three storage nodes for that partition. One of the two storage nodes MUST be the leader node for that partition. The third storage node is probably updated as well, but on the off chance something happened, it may not be. DynamoDB will get that one updated as soon as it can.

When you request a strongly consistent read, that read comes from the leader storage node for the partition the item(s) are stored in.

Note: I worked on the DynamoDB team at the time this was written. Also, check out this video from re:Invent to learn more about the internal plumbing of DynamoDB.

Upvotes: 5

georgeaf99
georgeaf99

Reputation: 202

I know I'm answering this question long after it has been asked, but I thought a could contribute some helpful information...

In a distributed database the concept of a "master" is not particularly relevant anymore (at least for reads/writes). Each node should be able to perform reads and writes, so that read/write performance increases as the # of machines increases. If you want reads to be correct immediately after a write, the number of machines you write to and then read from must be greater than the total number of machines in the system.

Example: if you only write to 1 machine, then you must read from all of them to ensure that your data is not stale. Or if you write to 2 machines (in this case, quorum) you can perform reads at quorum and guarantee that your data is recent.

NOTE: these assumptions change when a subset of nodes in the system crash.

Upvotes: -1

Michael - sqlbot
Michael - sqlbot

Reputation: 179354

I'll use MySQL to illustrate the answer, since you mentioned it, though, obviously, neither of us is implying that DynamoDB runs on MySQL.

In a single network with one MySQL master and any number of slaves, the answer seems extremely straightforward -- for eventual consistency, fetch the answer from a randomly-selected slave; for read-after-write consistency, always fetch the answer from the master.

even in MySQL replication data is replicated asynchronously

There's an important exception to that statement, and I suspect there's a good chance that it's closer to the reality of DynamoDB than any other alternative here: In a MySQL-compatible Galera cluster, replication among the masters is synchronous, because the masters collaborate on each transaction at commit-time and a transaction that can't be committed to all of the masters will also throw an error on the master where it originated. A cluster like this technically can operate with only 2 nodes, but should not have less than three, because when there is a split in the cluster, any node that finds itself alone or in a group smaller than half of the original cluster size will roll itself up into a harmless little ball and refuse to service queries, because it knows it's in an isolated minority and its data can no longer be trusted. So three is something of a magic number in a distributed environment like this, to avoid a catastrophic split-brain condition.

If we assume the "three geographically-distributed replicas" in DynamoDB are all "master" copies, they might operate with logic along same lines of synchronous masters like you'd find with Galera, so the solution would be essentially the same since that setup also allows any or all of the masters to still have conventional subtended asynchronous slaves using MySQL native replication. The difference there is that you could fetch from any of the masters that is currently connected to the cluster if you wanted read-after-write consistency, since all of them are in sync; otherwise fetch from a slave.

The third scenario I can think of would be analogous to three geographically-dispersed MySQL masters in a circular replication configuration, which, again, supports subtended slaves off of each master, but has the additional problems that the masters are not synchronous and there is no conflict resolution capability -- not at all viable for this application, but for purposes of discussion, the objective could still be achieved if each "object" had some kind of highly-precise timestamp. When read-after-write consistency is needed, the solution here might be for the system serving the response to poll all of the masters to find the newest version, not returning an answer until all masters had been polled, or to read from a slave for eventual consistency.

Essentially, if there's more than one "write master" then it would seem like the masters have no choice but to either collaborate at commit-time, or collaborate at consistent-read-time.

Interestingly, I think, in spite of some whining you can find in online opinion pieces about the disparity in pricing among the two read-consistency levels in DynamoDB, this analysis -- even as divorced from the reality of DynamoDB's internals as it is -- does seem to justify that discrepancy.

Eventually-consistent read replicas are essentially infinitely scalable (even with MySQL, where a master can easily serve several slaves, each of which can also easily serve several slaves of its own, each of which can serve several... ad infinitum) but read-after-write is not infinitely scalable, since by definition it would seem to require the involvement of a "more-authoritative" server, whatever that specifically means, thus justifying a higher price for reads where that level of consistency is required.

Upvotes: 3

Related Questions