Reputation: 3262
I have a table named 'holder' which has the single partition in which for every one hour we will have 60K entries,
I have another table named 'holderhistory' which has the 'date' as partitionId, so every day's record from 'holder' table will be copied to the 'holderhistory'
There will be a job running in the application
i) which collects all the older entries in holder table and copy to the holderhistory table
ii) Delete the older entries from holder table
NOW the issue is - there will be too many tombstones created in the holder table.
As default the tombstones will be cleared after 10 days (864000 seconds) gc_grace_seconds
But I don't want to keep the tombstone for more than 3 hours,
1) so It is good to set the gc_grace_seconds to 3 hours?
2) Or It is good to set the default_time_to_live to 3 hours?
Which is the best solution for deleting the tombstone?
Also what are the consequence on reducing the gc_grace_seconds from 10 days to 3 hours? where we will have impact?
Anyhelp is appreciated.
Upvotes: 1
Views: 2980
Reputation: 51
You might also find this support blog article useful:
https://academy.datastax.com/support-blog/cleaning-tombstones-datastax-dse-and-apache-cassandra
Upvotes: 0
Reputation: 441
To answer your particular case : as the table 'holder' contains only one partition, you can delete the whole partition with a single "delete by partition key" statement, effectively creating a single tombstone.
If you delete the partition once a day, you'll end up with 1 tombstone per day... that's quite acceptable.
1) with gc_grace_seconds
equals 3 hours, and if RF > 1, you will not be guaranteed to recover consistently from a node failure longer than 3 hours
2) with default_time_to_live
equals 3 hours, each record will be deleted by creating a tombstone 3 hours after insertion
So you could keep default gc_grace_seconds set to 10 days, and take care to delete your daily records with something like DELETE FROM table WHERE PartitionKey = X
EDIT: Answering to your comment about hinted handoff...
Let's say RF = 3
, gc_grace_second = 3h
and a node goes down. The 2 others replicas continue to receive mutations (insert, update, delete), but they can't replicate them to the offline node. In that case, hints will be stored on disk temporarily, to be sent later if the dead node comes back.
But a hint expires after gc_grace_seconds
, after what it will never been sent.
Now if you delete a row, it will generate a tombstone in the sstables of the 2 replicas and a hint in the coordinator node. After 3 hours, the tombstones are removed from the online nodes by the compaction manager, and the hint expires.
Later when your dead node comes back, it still have the row, and it can't know that this row has been deleted because no hint and no more tombstone exist on replicas... thus it's a zombie row.
Upvotes: 1
Reputation: 635
If you reduce the GCGraceSeconds parameter too low and the recovery time of any node longer than the GCGraceSeconds, in such case, once one of these nodes came back online, it would mistakenly think that all of the nodes that had received the delete had actually missed a write and it would start repairing all of the other nodes. I would recommend to use efault_time_to_live and give a try.
Upvotes: 1