Reputation: 2301
I am trying to find out what is a data holding capacity of each Cassandra node in a Cluster before it starts showing latency. Basically I need to find out what is the right time to start adding new nodes to the existing cluster. I am referring to this page.
We use VMs having single data disk of 100G size. Here is how I calculated the usable disk space for each node.
raw_capacity = disk_size * number_of_data_disk = 100 G * 1 = 100 G
formatted_disk_space = (raw_capacity * 0.9) = 100 G * 0.9 = 90 G
usable_disk_space = formatted_disk_space * (0.5 to 0.8) = 90 G * 0.5 = 45 G
So this means each node can hold data upto 45 G. Is this correct understanding?
Also if I need to compare it with current data size, can I directly compare it with nodetool status response? As per above calculation it can hold upto 45 G whereas my cluster is holding only around 11G data. I have been trying to read through, but may be because of my brains, I am not able to understand this.
Datacenter: prod_east
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
UN <IP_1> 11.17 GB NO TOKENS ? <token> rack1
UN <IP_2> 12.23 GB NO TOKENS ? <token> rack1
UN <IP_3> 10.72 GB NO TOKENS ? <token> rack1
Any help here is highly appreciated.
Upvotes: 0
Views: 712
Reputation: 313
Nodetool status load take in consideration the replication factor, so each node might be having 100% or maybe less, try to add the name if your keyspace as a nodetool status command argument and it will give you the data that each node owns.
Here is an example :
nodetool status your_keyspace_name
Datacenter: dc1
Status=Up/Down |/ State=Normal/Leaving/Joining/Moving
Address Load Tokens Owns Host ID Rack
UN 127.0.0.1 47.66 MB 1 33.3% x rack1
UN 127.0.0.2 47.67 MB 1 33.3% x rack1
UN 127.0.0.3 47.67 MB 1 33.3% x rack1
Upvotes: 1