How would you explain data (in)consistency to the audience who have no background on distributed storage systems?

Data consistency is an important issue in distributed storage systems, such as Amazon DynamoDB, Cassandra, Riak, Windows Azure and so on. It comes with the replication technique used to provide high performance, fault-tolerance, and scalability.

Data consistency model serves as a formal method to characterize the data consistency problem. However, it is often too formal to explain to the audience who have no background on distributed storage systems, let alone the notions of eventual consistency, causal consistency, sequential consistency, and so on.

Hence, an informal explanation would be better. In addition, a qualified explanation had better cover the following three key points:

Simple examples, nice figures, and concise accounts to illustrate the data (in)consistency problem.

To convey the idea that there are both weak (in)consistency and strong (in)consistency.

To show the influence of data (in)consistency on users or application programmers.

Upvotes: 1

Answers (1)

hengxin

Reputation: 1999

Answer My Own Question:

Consistency model is too formal and rigorous to explain to general audience. Fortunately (or maybe unfortunately), most people have the experience of data inconsistency. Data inconsistency often means strange behaviors to users. Basically, consistency models specify what behaviors are allowed and what are not. The key point here is there are various consistency models of different levels. The stronger the consistency model is, the less surprises the user gets. Strong consistency is desired for both users and developers: Users feel better about their data and developers find it easier to program.

Typical consistency conditions include (but are not limited to) eventual consistency, causal consistency, sequential consistency, and atomicity, from weak to strong.

Eventual consistency is prevalent in today's distributed storage systems, represented by Amazon's Dynamo. Eventual consistency is quite weak (in theory) because it makes little promises. Consider the following scenario (shown in figure):

data-inconsistency-eventual-blog http://i1.tietuku.com/48f5bfa9d6d925e0s.png

You have just finished your blog and can't wait to publish it. You push the "publish" button and immediately refresh the page and only to find that your blog is not there! It does happen when you publish your blog to a replica node and refresh on another one. Take it easy. Refresh, refresh, refresh, and it will appear eventually.

Causal consistency is stronger than eventual consistency in that it guarantees that two causally related events are observed by all the users in the causal order. Consider the following coined conversation on a social networking site (shown in figure), excerpted from a paper:

data-inconsistency-causal-conversation

The mother cannot find her son and posts a status to her friends: "I think Son is missing!".
The son replies to his mother to let her know that he is just playing out.
A friend observes this little conversation and posts a status in response: "What a luck!".

If causality is not respected, a third user could perceive effects before their causes. As shown in the right figure, she might think the friend is pleased to hear of the Son's disappearance!

Mission critical applications usually require much more stronger data consistency, such as transaction.

[To Be Continued]

Upvotes: 1

How would you explain data (in)consistency to the audience who have no background on distributed storage systems?

Answers (1)

Related Questions