Adam Szecowka
Adam Szecowka

Reputation: 694

Cassandra does not compact shadowed rows for TWCS

I have a Cassandra table with default TTL. Unfortunately, the default TTL was too small, so now I want to update the default TTL, but also I need to update all rows. Right now my table uses 80 GB of data. I am wondering how to perform this operation to not negatively impact performance.

For testing purposes, I adjusted a little bit configuration of my table:

AND compaction =  {'class' : 'TimeWindowCompactionStrategy',
    'compaction_window_unit' : 'MINUTES',
    'compaction_window_size' : 10 ,
    'tombstone_compaction_interval': 60,
    'log_all': true }
AND default_time_to_live = 86400
AND gc_grace_seconds = 100
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND speculative_retry = '99PERCENTILE';

I am using Time Window Compaction Strategy, compaction is executed every 10 minutes. To speed up all operations, I set tombstone_compaction_interval to 1 minute - so after one minute SSTable is taking into account for compaction. gc_grace_seconds is set to 100 seconds.

In my first scenario, I just overwrite every row without deleting it. As far as I understand, tombstones in that scenario are not created, I just shadow previously inserted rows. So I perform the following steps:

  1. write data
  2. nodetool flush - to flush memtable to sstable
  3. overwrite all rows
  4. nodetool flush
  5. Even after one hour both SStables exist
-rw-r--r-- 1 cassandra cassandra 4.7M Jan 30 14:04 md-1-big-Data.db
-rw-r--r-- 1 cassandra cassandra 4.7M Jan 30 14:11 md-2-big-Data.db

Of course, If I execute nodetool compact, I will end up with one SStable with size 4.7MB, but I was expecting that compacting an old SSTable will be executed automatically, as it happens when in an SSTable there are many tombstones. In the second scenario, I executed the same operations, but I explicitly removed every row before writing it again. The result was, the following:

-rw-r--r-- 1 cassandra cassandra 4.7M Jan 30 16:16 md-4-big-Data.db
-rw-r--r-- 1 cassandra cassandra 6.2M Jan 30 16:35 md-5-big-Data.db

So, SSTable was bigger, because it has to store information about tombstones and about new values. But again, SSTables were not compacted.

Can you explain to me, why automatic compaction was not executed? In this case old row, tombstone and new row can be replaced by just one entry that represents new row.

Upvotes: 0

Views: 168

Answers (1)

Madhavan
Madhavan

Reputation: 649

First, log_all value of true should not be set in a production cluster for an indefinite period of time. You could test it out in lower environments and then remove it in the production cluster. I believe this is temporarily turned on for triaging purposes only. There are other red flags here in your case above, for example, setting gc_grace_seconds to 100 seconds, you loose the opportunity/flexibility to recover during a catastrophic situation as you're compromising on the default hints generation and have to perform manual repairs, etc. You could read about why that's not a great idea in other SO questions.

First question we need to ask is if there is an opportunity to have an application downtime and then decide with other options.

If given a downtime window, I may work with the below procedure. Remember, there are multiple ways and this is just one of them.

  1. Ensure that application(a) aren't accessing the cluster.
  2. Issue a DSBulk unload operation to get the data exported out.
  3. Truncate the table
  4. Ensure you've the right table properties set (e.g. compaction settings, default ttl, etc.,).
  5. Issue a DSBulk load operation by specifying the desired TTL value in seconds using --dsbulk.schema.queryTtl number_seconds.
  6. Perform your validation prior to opening the application(s) traffic back.

Other reading references:

Upvotes: 1

Related Questions