user1968471
user1968471

Reputation: 311

Multiple version of db files in Cassandra data folder

I have been running my code to read/write to cassandra column families. I have observed that my table size is around 10 GB but the space on disk is consumed by db files for the same table is around 400 GB with different versions of files.

Would like to understand if the latest version of the file set has all the data required and can I remove the older versions? Does cassandra provide facility for rolling deletion of such files?

Upvotes: 0

Views: 859

Answers (1)

xmas79
xmas79

Reputation: 5180

The number you refer to is the number of the SSTable (I think it is technically called generation). Specifically, the format of the filename is:

CFName-Generation-SSTableFormat-ComponentFile

In you case:

CFName = la
Generation = 275x
SSTableFormat = BIG
ComponentFile = Data.db, TOC.txt, etc...

You can't really tell if the last SSTable contains all the data you need. The space on disk consumed by old generations may be released only if data in not referenced anymore (snapshots comes to mind), and their tombstones age is greater than the gc_grace_seconds.

You should first check if you have any snapshots, and eventually use the nodetool to remove them. Then you should investigate how your tombstones are distributed among these SSTables, and in that case you may have probably a bigger problem to solve if tombstones cannot get compacted away (eg schema redesign, or data migration to a new cluster).

Upvotes: 2

Related Questions