Reputation: 423
I have a 12 node cassandra cluster which is high on data load and disc space is almost nearing full capacity. I have expanded the cluster by adding 1 node and planning to add couple more. I could find that the data load got reduced after adding the new node. However, the disc space has not reduced. I fear running nodetool repair as this may require additional disc space and the available space may not be sufficient. There are suggestions to use nodetool cleanup, looks like this will also cause temporary increase in disk space. https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/tools/toolsCleanup.html
Please suggest if there are better ways to cleanup old data from other nodes to reclaim disc space
Upvotes: 1
Views: 1052
Reputation: 16393
Unfortunately, nodetool cleanup
is the only way you could evict data that a node no longer owns after nodes are added to a cluster in order to reclaim disk space.
In order for cleanup
to work, it temporarily uses more space since it needs to re-compact SSTables to new ones. This can be problematic if you have really large SSTables that are several GBs in size and don't have a lot of disk space left.
You can workaround this problem for large SSTables which are configured with SizeTieredCompactionStrategy
by splitting them into smaller files on another server using the sstablesplit
tool. I've documented the instructions in https://community.datastax.com/questions/6415/. Cheers!
Upvotes: 2