Reputation: 6845
Imagine a simplest Cassandra table on a Cassandra cluster of 2 nodes.
I issue a deletion command of a record. Imagine that node#2 is down at the time. Cassandra client receives a success response from node#1 and happily continues (consistency lvl = 1 for the command).
Then node#2 comes back up and it tries to sync data with node#1. Node#2 claims that it has a record that node#1 doesn't. How do they figure it out that it was a deletion action that deleted a record from node#1 and not insert action that added a record to node#2 (that didn't reach node#1 for any reason)? The reason I am talking about deletions is that I assume that after a deletion, Cassandra doesn't store a time-stamp of a deleted item.
Any useful links on the issue would be appreciated.
What I am talking in particular is either a Hinted-Handoff scenario or Read/Repairs.
Upvotes: 1
Views: 194
Reputation: 573
Cassandra Repair takes care of these situations.
When you delete data in Cassandra the data, it is not removed immediately, instead Cassandra creates tombstones indicating the row/column is deleted. Tombstones are stored till the gc_grace_seconds.
If you run repair regularly:
So when you run repair, the node sync the data and the tombstones created. So after gc_grace_seconds the tombstones are deleted.
If you do not run repair regularly:
Consider your gc_grace_seconds = 10 days and you delete a data in node #1 while node #2 was down, Cassandra creates tombstone for the deleted data in node #1. After some time when you bring the node #2 and did not run repair and after gc_grace_seconds (10 days) the tombstones are deleted in node #1 but not deleted in node #2 and if you read the data now then data will re-appear instead of deletion.
Hence you must run a regular repair on the Cassandra cluster.
Refer Cassandra docs about the deletes: http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/dml/dml_about_deletes_c.html
Upvotes: 1