Reputation: 3933
From what I have read so far Cassandra is using timestamps provided by client or coordinator to resolve conflicts. If Cassandra receives write for cell which already exists it picks up the one with higher timestamp.
In case of clock skews, when there are no concurrent updates and even when using ALL
consistency level, it still might be the case that client has updated value and received ACK from all servers. The actual value however was not updated since provided timestamp was older than existing value at this cell (due to clock skews). Such behaviour violates causal consistency, which AFAIK R+W>N was supposed to provide?
It seems to me that using logical clocks (lamport/vector clocks) to pick newest value and falling back to using actual timestamps (or other strategy that can provided by client) only when concurrent update was detected using read repair. Seems like a better solution and AFAIK this is more or less the approach that dynamo uses, right?
As I am probably missing something, can you let me know why Cassandra doesn't use such approach?
Upvotes: 3
Views: 500
Reputation: 136
As per the CAP theorem, in case of network partitioning, strongly consistent system will have a downtime. We know, logical clocks are strongly consistent, so in case of partitioning they will have a downtime.
In a practical sense, when you implement a logical clock, you implement using one of the quorum based algorithm, which becomes unavailable to the side of network partition, which has lesser number of nodes. So during partitioning, in your example, with a logical clock, either A or B will take writes and the other node will not have access to the logical clock, becoming incapable of serving writes.
Cassandra developers had three choices:
Casandra went with 3, but also provides server side default of 2 to simplify clients that don't need logical clock. How you can generate logical time on the client side that is same size integer as a clock time (in millis) is a separate (solved) problem.
Upvotes: 1
Reputation: 1740
Cassandra is an eventually consistent system and when it was designed (at Facebook) the engineers had to decide how to handle conflicts. They had several options: Last Update Win, have a code handler to be used on conflict, delegate conflict resolution to clients, etc.
I guess they went with Last Update Win due to simplicity. It has several edge cases, but they were designing Cassandra for their purpose and that approach was working for them.
The approach you are talking about is valid - the system returns all conflicting values to a customer and the customer decides what to do about that. It does add extra complexity to client code, which may be not a desired property.
Edit based on comment: why wall clock and not logical clock
Logical clocks (vector) help to detect concurrent updates, but it won't help to actually decide how to resolve the conflict. E.g. if there are two updates to the same key, vector will detect them, but there is no way to decide which one to use.
Since Cassandra does not return conflicting versions (by design) and does not merge them, they need a way to decide which record to use. They decided to use Last Update Wins strategy. One of options for this strategy is to use wall clock to decide.
p.s. Lamport timestamp would provide total order, but it requires completely different flow of data in the system.
Upvotes: 3