Vladimir Nabokov
Vladimir Nabokov

Reputation: 1907

Cassandra Virtual Nodes

Although it is asked many times and answered many times, I did not find a good answer anyway. Neither in forums nor in cassandra docs.

How do virtual nodes work?

Suppose a node having 256 virtual nodes. And docs say they are distributed randomly. (put away how that "randomly" done...I have another,more urgent question):

  1. Is that right that every cassandra node ("physical") actually responsible for several distinct locations in the ring? (for 256 locations)? Does that mean the "physical" node sort of "spread" on the whole circle?

  2. How in that case re-balancing works? If I add a new node? The ring will get an additional 256 nodes. How those additional nodes will divide the data with the old nodes? Will they, basically, appear as additional "bicycle spokes" randomly spread through the whole ring?

A lot of info on the internet, but nobody makes a clear explanation...

Upvotes: 1

Views: 638

Answers (2)

Serban Teodorescu
Serban Teodorescu

Reputation: 1406

LetsNoSQL answer is correct. See also https://stackoverflow.com/a/37982696/5209009. I'll only add a few more comments:

  1. Yes, the "physical" node is spread on the token range.
  2. As explained in the link, any new node will take 256 new token ranges, dividing some of the existing ones. There is no other rebalancing, it relies on randomness to achieve some rebalancing, that's why it's using a relatively large (256) number of tokens per node.

It's worth mentioning that there is another option. You can run vnodes with a smaller number of tokens per node (4-8) with a token allocation algorithm. Any new tokens will not be allocated randomly, a greedy algorithm will be used so that the new tokens will create a distribution that optimises the load on a given keyspace. It will simply divide in half the token ranges containing most of the data. Since it's not random it can work with a smaller number of tokens (4-8). It's not really relevant for small clusters, but for 100+ nodes it can be.

See https://www.datastax.com/blog/2016/01/new-token-allocation-algorithm-cassandra-30 and https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html.

Upvotes: 2

LetsNoSQL
LetsNoSQL

Reputation: 1538

Vnodes break up the available range of tokens into smaller ranges, defined by the num_tokens setting in the cassandra.yaml file. The vnode ranges are randomly distributed across the cluster and are generally non-contiguous. If we use a large number for num_tokens to break up the token ranges, the random distribution means it is less likely that we will have hot spots.Using statistical computation, the point where all clusters of any size always had a good token range balance was when 256 vnodes were used. Hence, the num_tokens default value of 256 was the recommended by the community to prevent hot spots in a cluster.

Ans 1:- It is a range of tokens based on num_tokens. if you have set 256 the you will get 256 token ranges which is default.

Ans 2:- Yes, when you are adding or removing the nodes the tokens will distribute again in the cluster based on vnodes configurations.

you may refer for more details are here https://docs.datastax.com/en/ddac/doc/datastax_enterprise/dbArch/archDataDistributeVnodesUsing.html

Upvotes: 3

Related Questions