Radhika
Radhika

Reputation: 423

How to rebalance and reclaim disk space after adding a Cassandra node

I have a 12 node cassandra cluster which is high on data load and disc space is almost nearing full capacity. I have expanded the cluster by adding 1 node and planning to add couple more. I could find that the data load got reduced after adding the new node. However, the disc space has not reduced. I fear running nodetool repair as this may require additional disc space and the available space may not be sufficient. There are suggestions to use nodetool cleanup, looks like this will also cause temporary increase in disk space. https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/tools/toolsCleanup.html

Please suggest if there are better ways to cleanup old data from other nodes to reclaim disc space

Upvotes: 1

Views: 1052

Answers (1)

Erick Ramirez
Erick Ramirez

Reputation: 16393

Unfortunately, nodetool cleanup is the only way you could evict data that a node no longer owns after nodes are added to a cluster in order to reclaim disk space.

In order for cleanup to work, it temporarily uses more space since it needs to re-compact SSTables to new ones. This can be problematic if you have really large SSTables that are several GBs in size and don't have a lot of disk space left.

You can workaround this problem for large SSTables which are configured with SizeTieredCompactionStrategy by splitting them into smaller files on another server using the sstablesplit tool. I've documented the instructions in https://community.datastax.com/questions/6415/. Cheers!

Upvotes: 2

Related Questions