Reputation: 311
I have been running my code to read/write to cassandra column families. I have observed that my table size is around 10 GB but the space on disk is consumed by db files for the same table is around 400 GB with different versions of files.
la-2749-big-Statistics.db la-2750-big-Index.db la-2750-big-Filter.db la-2750-big-Summary.db la-2750-big-Data.db la-2750-big-Digest.adler32 la-2750-big-CRC.db la-2750-big-TOC.txt la-2750-big-Statistics.db
la-2751-big-Filter.db la-2751-big-Index.db la-2751-big-Summary.db
la-2751-big-Data.db la-2751-big-Digest.adler32 la-2751-big-CRC.db
la-2751-big-Statistics.db la-2751-big-TOC.txt
Would like to understand if the latest version of the file set has all the data required and can I remove the older versions? Does cassandra provide facility for rolling deletion of such files?
Upvotes: 0
Views: 859
Reputation: 5180
The number you refer to is the number of the SSTable (I think it is technically called generation). Specifically, the format of the filename is:
CFName-Generation-SSTableFormat-ComponentFile
In you case:
CFName = la
Generation = 275x
SSTableFormat = BIG
ComponentFile = Data.db, TOC.txt, etc...
You can't really tell if the last SSTable contains all the data you need. The space on disk consumed by old generations may be released only if data in not referenced anymore (snapshots comes to mind), and their tombstones age is greater than the gc_grace_seconds
.
You should first check if you have any snapshots, and eventually use the nodetool
to remove them. Then you should investigate how your tombstones are distributed among these SSTables, and in that case you may have probably a bigger problem to solve if tombstones cannot get compacted away (eg schema redesign, or data migration to a new cluster).
Upvotes: 2