Reputation: 1
In Cassandra Cluster, how can we ensure all nodes are having almost equal data, instead one node has more data, another has very less.
If this scenario occurs, what are the best practices
Thanks
Upvotes: 0
Views: 524
Reputation: 5180
Unless you are using ByteOrderedPartitioner for your cluster that should not happen. See DataStax documentation here for more information about available partitioners and why it should not (normally) happen.
Upvotes: 0
Reputation: 16403
It is ok to expect a slight variation of 5-10%. The most common causes are the distribution of your partitions may not be truly random (more partitions on some nodes) and there may be a large variation in the size of the partitions (smallest partition is a few kilobytes but largest partition is 2GB).
There are also 2 other possible scenarios to consider.
SINGLE-TOKEN CLUSTER
If the tokens are not correctly calculated, some nodes may have a larger token range compared to others. Use the token generation tool to get a list of tokens that is correctly distributed around the ring.
If the cluster is deployed with DataStax Enterprise, the easiest way is to rebalance your cluster with OpsCenter.
VNODES CLUSTER
Confirm that you have allocated the same number of tokens in cassandra.yaml
with the num_tokens
directive.
Upvotes: 0