Reputation: 23
I'm running Cassandra 2.2.1, 3 node cluster at a RF=3. If I perform simple deletes at quorum on a bunch of entries, verifying the results via a select at quorum reveals that some entries that should have been deleted persist in the table. The delete queries which were issued through the Java driver completed successfully without exception. I also use a retry policy to handle failed delete/writes but the policy for these failures is never invoked because they 'succeed'. I can reproduce the problem 100% of the time, it usually starts happening after I've issued around 100 deletes into the table. I understand how tombstones and gc grace period work and this is not a situation of resurected deletes. Read somewhere that it could be a ntp issue but all 3 nodes sync to the same clock and there's no drift as I can tell. I can share logs or anything else required to root cause. Thanks!
Update: I resolved the problem and it seems to be a weird race condition that appears to either be time related or sequence related. If there is some time drift between nodes could be possible for the delete to be ignored if it was issued ahead of the insert from a tagged timestamp perspective.
E.G. -insert is issued by node 1 at T1 (timestamp of node 1) -delete comes into the system via node 3 but tagged with timestamp T0 -system concludes that insert occurred later so ignores delete
This gives the illusion that the delete executes ahead of insert depending on the timestamp sent by the respective nodes.
Allowing sufficient time between insert and delete resolved my issue although I'm not quite sure what the real root cause was.
Upvotes: 1
Views: 225
Reputation: 477
Another option is to enable client side timestamps (instead of server side which is what you currently have).
If the same client issues the insert/update/delete it assures that the timestamps will be inline with the operation invocation.
using client side timestamps will remove the need to have a “sufficient time“ between insert/update and delete.
Please note that correct timestamp is also needed for cases in which two consective writes update the same “key“ (and this bugs are harder to detect :( ). Client side timestamps resolves such issues as well (given that the same client issues the requests)
Upvotes: 2
Reputation: 2283
How much time do you have between the delete and the select? As Cassandra has an "eventually consistent" behaviour, adding a delay between the delete and the select may solve the issue
Upvotes: 1