GenerousJoker
GenerousJoker

Reputation: 97

Cassandra constant tombstone compaction of table

I have a couple of Cassandra tables on which tombstone compaction is constantly being run and I believe this is the reason behind high CPU usage by the Cassandra process.

Settings I have include:

compaction = {'tombstone_threshold': '0.01', 
'tombstone_compaction_interval': '1', 'class': 
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
default_time_to_live = 1728000
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0

In one of the tables I write data to it every minute. Because of the TTL that is set, a whole set of rows expire every minute too.

Upvotes: 2

Views: 1934

Answers (1)

Jeff Beck
Jeff Beck

Reputation: 3938

So the tombstone compaction can fire assuming the SSTable is as least as old as the compaction interval. SStables are created as things are compacted. the threshold is how much of the sstable is tombstones before compacting just for tombstones instead of joining sstables.

You are using leveled and have a 20 day ttl it looks like. You will be doing a ton of compactions as well as tombstone compactions just to keep up. Leveled will be the best to make sure you don't have old tombstone eating up disk space of the default compactors.

If this data is time-series which is sounds like it is you may want to consider using TWCS instead. This will create "buckets" which are each an sstable once compacted so once the ttl for the data in that table expires the compactor can drop the whole sstable which is much more efficient.

TWCS is available as a jar you need to add to the classpath for 2.1 and we use it currently in production. It has been added in the 3.x series of Cassandra as well.

Upvotes: 2

Related Questions