Reputation: 121
I have a Cassandra cluster of 4 nodes that produces around 100 GB of storage on a daily basis. The ttl period is set to 12 hours on all inserted records, however, the grace_period is set to 10 days by default. This setting resulted in having big amount of expired data on disk. Now I want to change the grace period to 0, yet, I'm not sure how heavy the compaction process would be, taking into account that there is a big amount of expired data that got accumulated over around 4 days. Any idea or recommendation?
Upvotes: 2
Views: 170
Reputation: 38
This would be depend on what type of operation doing on cassandra .Because compaction strategy must be according to data operation to perform best . First read this link
http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_configure_compaction_t.html
Read about all compaction strategy and grace period . It will solve your problem . According to me you have to decrease your grace period from some deference ttl . otherwise to operation will perform together and cassandra performance will slow .
Upvotes: 0
Reputation: 4426
The reality is going to depend on how intermixed your data is, how much compaction has happened in the past, and how much you can tolerate extra IO. Generally speaking, your (now deleted) data has likely been combined with other data that may still be alive. In size tiered, it may be grouped into a very large file by now, that won't compact again unless min_threshold (typically 4) files of the same size exist - with new, lower gc_grace_seconds, that may never happen. Date tiered is designed to drop entire tables once the whole file is expired - if you didn't start with DTCS, it's likely that you have data that isn't fully deleted.
In your case, it may be easiest to issue a major compaction (nodetool compact keyspace table), which will take ALL files and compact them into one single large sstable, which will purge all tombstoned data immediately. You'll end up with one big file (which is usually a negative, for the reason described above - it won't compact naturally again), but it will immediately purge the other data.
If you don't have enough space for a major compaction, you can compact each file one at a time using the JMX 'forceUserDefinedCompaction' endpoint. It's pretty trivial to do manually, or there are scripts online that will assist you in this process ( http://www.encql.com/product/encql-tombstone-cleaner/ for $50 if it's that important to you ).
Upvotes: 1
Reputation: 9475
It might depend somewhat on the compaction strategy you are using, but throwing away expired data should be fast since after reading the data to compact it, it does not need to be written back to disk.
Probably date tiered would be the fastest at this, since it knows the age of the data in each SSTable and could discard entire SSTables.
Upvotes: 0