Reputation: 25
Using the Vnodes strategy with 256 tokens per node, my cluster shows info like below while executing nodetool status. Seems the load of my cluster is extremely unbalanced. I dont know what cause this. Is partition key of tables related ? Any comments would be welcome, Thanks!
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 192.168.1.190 9.78 GiB 256 ? f3e56d8d-caf2-450a-b4f1-e5ac5163a17a rack1
UN 192.168.1.191 77.53 MiB 256 ? e464cda9-ca8b-400b-82eb-59256d23c94a rack1
UN 192.168.1.192 89.31 MiB 256 ? 6feaa65f-2523-4b65-9717-f4f3cbf28ef0 rack1
Upvotes: 0
Views: 899
Reputation: 87254
Yes, most probably there is a skew in the distribution of partition keys, most probably some partitions have much more rows than other. Check this document for recommendations, especially the sections "Number of cells per partition" and "Big partitions". You can use the number of tools to check the hypothesis:
nodetool tablehistograms
(may need to be executed for every table separately) on each host will show you the number of cells and partition size in bytes at 50%, 75%, ..., and 100% percentiles. You may see very big differences between 95% & 100% percentiles.nodetool tablestats
will show the max & average size of the partition per table per hostnodetool
:dsbulk count -k keyspace -t table --log.verbosity 0 --stats.mode partitions
Upvotes: 1
Reputation: 25
Most tables are uniform but this one
kairosdb/data_points histograms —— NODE1 load 9.73GB
Percentile SSTables Write Latency Read Latency Partition Size Cell Count
(micros) (micros) (bytes)
50% 2.00 17.08 152.32 1597 86
75% 2.00 35.43 182.79 9887 446
95% 2.00 88.15 263.21 73457 3973
98% 2.00 105.78 263.21 315852 20501
99% 2.00 105.78 263.21 545791 29521
Min 2.00 6.87 126.94 104 0
Max 2.00 105.78 263.21 785939 35425
kairosdb/data_points histograms —— NODE2 load 36.95MB
Percentile SSTables Write Latency Read Latency Partition Size Cell Count
(micros) (micros) (bytes)
50% 1.00 20.50 454.83 1109 42
75% 2.00 42.51 943.13 9887 446
95% 2.00 73.46 14530.76 73457 3973
98% 2.00 219.34 14530.76 263210 17084
99% 2.00 219.34 14530.76 545791 29521
Min 1.00 8.24 88.15 104 0
Max 2.00 219.34 14530.76 785939 35425
kairosdb/data_points histograms —— NODE3 load 61.56MB
Percentile SSTables Write Latency Read Latency Partition Size Cell Count
(micros) (micros) (bytes)
50% 1.00 14.24 943.13 1331 50
75% 1.00 29.52 1131.75 9887 446
95% 1.00 61.21 1131.75 73457 3973
98% 1.00 152.32 1131.75 315852 17084
99% 1.00 654.95 1131.75 545791 29521
Min 1.00 4.77 785.94 73 0
Max 1.00 654.95 1131.75 785939 35425
Upvotes: 0
Reputation: 27294
Even with a significant imbalance of the primary token range - something about the load is not right - if you are using an RF of 3, all 3 nodes would have a replica of all the data, and any primary range imbalance would not be visible.
To get the imbalance you have posted points to the use of using RF1 - and potentially a poor data model / partition key which is hotspotting the data to a single node.
Upvotes: 2