reikje
reikje

Reputation: 3064

When does Cassandra remove data from an SSTable

In Cassandra 2.x when I delete one or multiple columns, they receive a tombstone in the Memtable but the data is not removed. At some point, the Memtable is flushed to an SSTable including the deleted data and the tombstone. When compaction is running, it will retain the tombstone with the specified grace period. What happens to the data? I have deleted a bunch of columns last week - less than gc_grace_seconds ago. I am not sure compaction has run yet. I haven't seen any change on disk size used yet, so I was wondering at which point is the data physically removed from disk?

Upvotes: 3

Views: 1033

Answers (1)

shutty
shutty

Reputation: 3338

In Cassandra 2.x when I delete one or multiple columns, they receive a tombstone in the Memtable but the data is not removed. At some point, the Memtable is flushed to an SSTable including the deleted data and the tombstone. When compaction is running, it will retain the tombstone with the specified grace period.

True.

What happens to the data?

The data will remain on disk at least for gc_grace_seconds. Next minor compaction right after gc_grace_seconds may remove it, but real timing depends mostly on your dataset and workload type.

I have deleted a bunch of columns last week - less than gc_grace_seconds ago. I am not sure compaction has run yet. I haven't seen any change on disk size used yet, so I was wondering at which point is the data physically removed from disk?

If you want to free some disk space, you can:

  • wait for gc_grace_seconds for normal minor compaction.
  • run nodetool compact which will trigger major compaction on current node freeing disk space right now.

Upvotes: 3

Related Questions