kha
kha

Reputation: 19993

Repartitioning of data in Cassandra when adding new servers

Let's imagine I have a Cassandra cluster with 3 nodes, each having 100GB of available hard disk space. Replication Factor for this cluster is set to 3 and R/W CLs are set to 2, meaning I can tolerate one of my nodes going down without sacrificing consistency or availability.

Now imagine my servers have started to fill up (80GB as an example) and I would like to add another 3 servers of the same specification to my cluster, maintaining the same CLs and RFs.

My question is: after I've added the new nodes to my cluster and run the node repair tool, is it fair to assume that each of my nodes should roughly (more or less a few GBs) contain 40GB of data each?

If not, how can I add new nodes without having the fear of running out of hard disk space?

A little background of why I'm asking this question: I am developing an app that connects to a server that runs Cassandra for its data storage. As this is only developed by me, and I have limited resources in terms of money to buy servers, I've decided that I would like to buy small, cheap "servers" instead of the more expensive rack options but I'm really worried about the nodes running out of space if the disk allocation is not (at least partially) homogenous.

Many thanks for you help,

Upvotes: 4

Views: 228

Answers (1)

RussS
RussS

Reputation: 16576

My question is: after I've added the new nodes to my cluster and run the node repair tool, is it fair to assume that each of my nodes should roughly (more or less a few GBs) 40GB of data each

After also running nodetool cleanup you should see roughly 40GB of data on each node. Cleanup removes data which the node is no longer responsible for. If you don't run this command the old data will remain on the machine.

Upvotes: 6

Related Questions