Cassandra does not compact shadowed rows for TWCS

Question

I have a Cassandra table with default TTL. Unfortunately, the default TTL was too small, so now I want to update the default TTL, but also I need to update all rows. Right now my table uses 80 GB of data. I am wondering how to perform this operation to not negatively impact performance.

For testing purposes, I adjusted a little bit configuration of my table:

AND compaction =  {'class' : 'TimeWindowCompactionStrategy',
    'compaction_window_unit' : 'MINUTES',
    'compaction_window_size' : 10 ,
    'tombstone_compaction_interval': 60,
    'log_all': true }
AND default_time_to_live = 86400
AND gc_grace_seconds = 100
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND speculative_retry = '99PERCENTILE';

I am using Time Window Compaction Strategy, compaction is executed every 10 minutes. To speed up all operations, I set tombstone_compaction_interval to 1 minute - so after one minute SSTable is taking into account for compaction. gc_grace_seconds is set to 100 seconds.

In my first scenario, I just overwrite every row without deleting it. As far as I understand, tombstones in that scenario are not created, I just shadow previously inserted rows. So I perform the following steps:

write data
nodetool flush - to flush memtable to sstable
overwrite all rows
nodetool flush
Even after one hour both SStables exist

-rw-r--r-- 1 cassandra cassandra 4.7M Jan 30 14:04 md-1-big-Data.db
-rw-r--r-- 1 cassandra cassandra 4.7M Jan 30 14:11 md-2-big-Data.db

Of course, If I execute nodetool compact, I will end up with one SStable with size 4.7MB, but I was expecting that compacting an old SSTable will be executed automatically, as it happens when in an SSTable there are many tombstones. In the second scenario, I executed the same operations, but I explicitly removed every row before writing it again. The result was, the following:

-rw-r--r-- 1 cassandra cassandra 4.7M Jan 30 16:16 md-4-big-Data.db
-rw-r--r-- 1 cassandra cassandra 6.2M Jan 30 16:35 md-5-big-Data.db

So, SSTable was bigger, because it has to store information about tombstones and about new values. But again, SSTables were not compacted.

Can you explain to me, why automatic compaction was not executed? In this case old row, tombstone and new row can be replaced by just one entry that represents new row.

Cassandra does not compact shadowed rows for TWCS

Answers (1)

Related Questions