VIKRAM SINGH CHOUHAN
VIKRAM SINGH CHOUHAN

Reputation: 65

Cassandra hard disk requirement with SizeTieredCompactionStrategy

I was going through Cassandra's SizeTieredCompactionStrategy and found out that it can sometimes double the size of the dataset's largest table during the compaction process. But I didn't get any information regarding when this can happen? Does anyone know about this?

Upvotes: 2

Views: 119

Answers (1)

Alex Ott
Alex Ott

Reputation: 87214

This requirement arises from the fact that compaction process should have enough space to take all SSTables that should be compacted, read data from them, and write new SSTable to the same disk. In the worst case, if you have table consisting of all SSTables that should be compacted, their total size is 50% of available disk space, and no data will be thrown away - in this case, compaction process will write a single SSTable that is equal to size of input data. And if you have input data occupying more than 50% of disk space, compaction won't have enough space for writing a new version.

In real situation, you need to have enough space to compact biggest SSTables in your biggest table performed by N compaction threads at the same time. If you have many tables of similar size, then this restriction is not so strong...

Upvotes: 2

Related Questions