user1401472
user1401472

Reputation: 2301

Cassandra Node Capacity Calculation

I am trying to find out what is a data holding capacity of each Cassandra node in a Cluster before it starts showing latency. Basically I need to find out what is the right time to start adding new nodes to the existing cluster. I am referring to this page.

We use VMs having single data disk of 100G size. Here is how I calculated the usable disk space for each node.

raw_capacity = disk_size * number_of_data_disk = 100 G * 1 = 100 G

formatted_disk_space = (raw_capacity * 0.9) = 100 G * 0.9 = 90 G

usable_disk_space = formatted_disk_space * (0.5 to 0.8) = 90 G * 0.5 = 45 G

So this means each node can hold data upto 45 G. Is this correct understanding?

Also if I need to compare it with current data size, can I directly compare it with nodetool status response? As per above calculation it can hold upto 45 G whereas my cluster is holding only around 11G data. I have been trying to read through, but may be because of my brains, I am not able to understand this.

Datacenter: prod_east
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving

UN  <IP_1>  11.17 GB   NO TOKENS          ?       <token>  rack1
UN  <IP_2>  12.23 GB   NO TOKENS          ?       <token>  rack1
UN  <IP_3>  10.72 GB   NO TOKENS          ?       <token>  rack1

Any help here is highly appreciated.

Upvotes: 0

Views: 712

Answers (1)

Saifallah KETBI
Saifallah KETBI

Reputation: 313

Nodetool status load take in consideration the replication factor, so each node might be having 100% or maybe less, try to add the name if your keyspace as a nodetool status command argument and it will give you the data that each node owns.

Here is an example :

nodetool status your_keyspace_name

Datacenter: dc1

Status=Up/Down |/ State=Normal/Leaving/Joining/Moving

Address Load Tokens Owns Host ID Rack

UN 127.0.0.1 47.66 MB 1 33.3% x rack1

UN 127.0.0.2 47.67 MB 1 33.3% x rack1

UN 127.0.0.3 47.67 MB 1 33.3% x rack1

Upvotes: 1

Related Questions