Reputation: 11
After unsuccessful nodetool repair
operation I got two big sstable files (last two in the listing below) instead of one, each having the same size as a single file before. And now this files cannot be merged back by common tools (nodetool clean, nodetool compact, nodetool repair). Tables are replicated to another cassandra node (replication_factor: 2), and there are two big sstable files as well now.
-rw-r--r-- 1 cassandra cassandra 16M Mar 5 12:36 mc-116413-big-Data.db
-rw-r--r-- 1 cassandra cassandra 34M Mar 5 01:21 mc-116320-big-Index.db
-rw-r--r-- 1 cassandra cassandra 39M Mar 3 22:46 mc-116125-big-Index.db
-rw-r--r-- 1 cassandra cassandra 66M Mar 5 12:25 mc-116412-big-Data.db
-rw-r--r-- 1 cassandra cassandra 262M Mar 5 05:51 mc-116365-big-Data.db
-rw-r--r-- 1 cassandra cassandra 263M Mar 5 08:46 mc-116386-big-Data.db
-rw-r--r-- 1 cassandra cassandra 263M Mar 5 11:42 mc-116407-big-Data.db
-rw-r--r-- 1 cassandra cassandra 7.2G Mar 5 03:18 mc-116345-big-Data.db
-rw-r--r-- 1 cassandra cassandra 43G Mar 3 22:46 mc-116125-big-Data.db
-rw-r--r-- 1 cassandra cassandra 48G Mar 5 01:21 mc-116320-big-Data.db```
I suppose that one of this files contains duplicated data. How can I compact files back to a single file?
Upvotes: 1
Views: 207
Reputation: 16303
Maybe I'm not looking properly but I don't see any duplicate SSTable files in the file listing you posted.
If you're referring to these 2:
-rw-r--r-- 1 cassandra cassandra 43G Mar 3 22:46 mc-116125-big-Data.db
-rw-r--r-- 1 cassandra cassandra 48G Mar 5 01:21 mc-116320-big-Data.db
They're not duplicates because they have 2 different generation IDs -- 116125
and 116320
. This means they also have different ancestors.
If you're referring to these:
-rw-r--r-- 1 cassandra cassandra 39M Mar 3 22:46 mc-116125-big-Index.db
-rw-r--r-- 1 cassandra cassandra 43G Mar 3 22:46 mc-116125-big-Data.db
-rw-r--r-- 1 cassandra cassandra 34M Mar 5 01:21 mc-116320-big-Index.db
-rw-r--r-- 1 cassandra cassandra 48G Mar 5 01:21 mc-116320-big-Data.db
Again, they're not duplicates of each other. The *-Data.db
files contain the actual data. The *-Index.db
files are component files which contain the partition index, i.e. the index of the partitions within the data files which are used for fast retrieval.
If you're interested, I've explained it in a bit more detail in this post -- https://community.datastax.com/questions/5219/. Cheers!
[UPDATE] To respond to this follow-up question:
Could you suppose why this two files don`t compacted in a single file, as usual do?
Assuming the table is configured with SizeTieredCompactionStrategy
, it will require similar-sized sstables as candidates before they get compacted together.
The default minimum sstable candidates is min_threshold: 4
so you need 4 similarly-sized sstables for a compaction to be triggered.
Upvotes: 1