guy dadon
guy dadon

Reputation: 322

How to set TTL on Cassandra sstable

We are using Cassandra 3.10 with 6 nodes cluster.

lately, we noticed that our data volume increased drastically, approximately 4GB per day in each node. We want to implement a more aggressive retention policy in which we will change the compaction to TWCS with 1-hour window size and set a few days TTL, this can be achieved via the table properties.

Since the ETL should be a slow process in order to lighten Cassandra workload it possible that it will not finish extracting all the data until the TTL, so I wanted to know is there a way for the ETL process to set TTL=0 on entire SSTable once it done extracting it?

Upvotes: 0

Views: 301

Answers (2)

LetsNoSQL
LetsNoSQL

Reputation: 1538

You should set TTL 0 on table and query level as well. Once TTL expire data will converted to tombstones. Based on gc_grace_seconds value next compaction will clear all the tombstones. you may run major compaction also to clear tombstones but it is not recommended in cassandra based on compaction strategy. if STCS atleast 50% disk required to run healthy compaction.

Upvotes: 0

Chris Lohfink
Chris Lohfink

Reputation: 16400

TTL=0 is read as a tombstone. When next compacted it would be written tombstone or purged depending on your gc_grace. Other than the overhead of doing the writes of the tombstone it might be easier just to do a delete or create sstables that contain the necessary tombstones than to rewrite all the existing sstables. If its more efficient to do range or point tombstones will depend on your version and schema.

An option that might be easiest is to actually use a different compaction strategy all together or a custom one like https://github.com/protectwise/cassandra-util/tree/master/deleting-compaction-strategy. You can then just purge data on compactions that have been processed. This still depends quite a bit on your schema on how hard it would be to mark whats been processed or not.

Upvotes: 1

Related Questions