Reputation: 817
In our keyspace, we have only a few tables out of which one contains most of the data. In that table, there is only one ColumnEntity(say column X) that contains 99.99% data. When data is no more relevant we set the TTL for few days and also set the column X to null(from java process). Ideally, this should immediately free up significant space on disk as Column X had 90% of total keyspace data but we are not seeing any reduction in disk space usage.
And also, after TTL expires that data is deleting perfectly but again we are not seeing any space freeing up.
What are we missing?
Upvotes: 2
Views: 146
Reputation: 87069
In Cassandra, no data is modified in-place - all files are immutable. When you perform delete or insert the null (it's the same), the special marker is added, in addition to the having previous data on disk. So when you adding the data, you're actually adding more data :-)
The actual deletion of the data happens when the SSTable files are compacted by background compaction. The scheduling of file's compaction depends on the used compaction strategy, and its configuration options. There could be situations, when you have old data in the big files, that may not be compacted for a while. Depending on the your version of Cassandra/DSE, you can enforce the compaction of all data, by performing nodetool compact -s
on every node, but this will require to have enough disk space (the size of the table). Another opportunity is to use nodetool garbagecollect -g CELL
on the individual SSTables, but it will also require free disk space.
P.S. I recommend to take at least DS201 course on the DataStax Academy.
Upvotes: 3