Cassandra Virtual Nodes

Question

Although it is asked many times and answered many times, I did not find a good answer anyway. Neither in forums nor in cassandra docs.

How do virtual nodes work?

Suppose a node having 256 virtual nodes. And docs say they are distributed randomly. (put away how that "randomly" done...I have another,more urgent question):

Is that right that every cassandra node ("physical") actually responsible for several distinct locations in the ring? (for 256 locations)? Does that mean the "physical" node sort of "spread" on the whole circle?
How in that case re-balancing works? If I add a new node? The ring will get an additional 256 nodes. How those additional nodes will divide the data with the old nodes? Will they, basically, appear as additional "bicycle spokes" randomly spread through the whole ring?

A lot of info on the internet, but nobody makes a clear explanation...

LetsNoSQL · Accepted Answer

Vnodes break up the available range of tokens into smaller ranges, defined by the num_tokens setting in the cassandra.yaml file. The vnode ranges are randomly distributed across the cluster and are generally non-contiguous. If we use a large number for num_tokens to break up the token ranges, the random distribution means it is less likely that we will have hot spots.Using statistical computation, the point where all clusters of any size always had a good token range balance was when 256 vnodes were used. Hence, the num_tokens default value of 256 was the recommended by the community to prevent hot spots in a cluster.

Ans 1:- It is a range of tokens based on num_tokens. if you have set 256 the you will get 256 token ranges which is default.

Ans 2:- Yes, when you are adding or removing the nodes the tokens will distribute again in the cluster based on vnodes configurations.

you may refer for more details are here https://docs.datastax.com/en/ddac/doc/datastax_enterprise/dbArch/archDataDistributeVnodesUsing.html

Cassandra Virtual Nodes

Answers (2)

Related Questions