Reputation: 1907
Although it is asked many times and answered many times, I did not find a good answer anyway. Neither in forums nor in cassandra docs.
How do virtual nodes work?
Suppose a node having 256 virtual nodes. And docs say they are distributed randomly. (put away how that "randomly" done...I have another,more urgent question):
Is that right that every cassandra node ("physical") actually responsible for several distinct locations in the ring? (for 256 locations)? Does that mean the "physical" node sort of "spread" on the whole circle?
How in that case re-balancing works? If I add a new node? The ring will get an additional 256 nodes. How those additional nodes will divide the data with the old nodes? Will they, basically, appear as additional "bicycle spokes" randomly spread through the whole ring?
A lot of info on the internet, but nobody makes a clear explanation...
Upvotes: 1
Views: 638
Reputation: 1406
LetsNoSQL answer is correct. See also https://stackoverflow.com/a/37982696/5209009. I'll only add a few more comments:
It's worth mentioning that there is another option. You can run vnodes with a smaller number of tokens per node (4-8) with a token allocation algorithm. Any new tokens will not be allocated randomly, a greedy algorithm will be used so that the new tokens will create a distribution that optimises the load on a given keyspace. It will simply divide in half the token ranges containing most of the data. Since it's not random it can work with a smaller number of tokens (4-8). It's not really relevant for small clusters, but for 100+ nodes it can be.
See https://www.datastax.com/blog/2016/01/new-token-allocation-algorithm-cassandra-30 and https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html.
Upvotes: 2
Reputation: 1538
Vnodes break up the available range of tokens into smaller ranges, defined by the num_tokens setting in the cassandra.yaml file. The vnode ranges are randomly distributed across the cluster and are generally non-contiguous. If we use a large number for num_tokens to break up the token ranges, the random distribution means it is less likely that we will have hot spots.Using statistical computation, the point where all clusters of any size always had a good token range balance was when 256 vnodes were used. Hence, the num_tokens default value of 256 was the recommended by the community to prevent hot spots in a cluster.
Ans 1:- It is a range of tokens based on num_tokens. if you have set 256 the you will get 256 token ranges which is default.
Ans 2:- Yes, when you are adding or removing the nodes the tokens will distribute again in the cluster based on vnodes configurations.
you may refer for more details are here https://docs.datastax.com/en/ddac/doc/datastax_enterprise/dbArch/archDataDistributeVnodesUsing.html
Upvotes: 3