Reputation: 153
I recently started exploring Cassandra for a new project. Here is my understanding of Cassandra as of now (based on numerous blogs available online)
Now my questions are, suppose I create a cluster with 3 nodes and RF = 2, then
Sorry if these questions seem very basic but I spent quite some time but could not find definitive answers.
Upvotes: 1
Views: 2331
Reputation: 57808
Providing additional depth:
Token range assignment does have a random component to it, so it's not exactly "even." However, if you're using allocate_tokens_for_keyspace
, the new token allocation algorithm takes over, and optimizes future range assignments based on the replication factor (RF) of the specified keyspace.
Here is a six line section of abbreviated output from nodetool ring
, from a 3 node cluster built with num_tokens=16
. Note that the "range" is effectively the defined starting token (hash) all the way until the next starting token minus one: -6595849996054463311 -5923426258674511018 -5194860430157391004 -4076256821118426122 -3750110785943336998 -3045824679140675270
Watch what happens when I add a 4th node: -6595849996054463311 -5923426258674511018 -5711305561631524277 -5194860430157391004 -4831174780910733952 -4076256821118426122 -3750110785943336998 -3659290179273062522 -3045824679140675270
Notice that the starting token ranges for each of the three original nodes remain the same. But in this particular case, ranges on
were bisected with the latter half assigned to
Note that once streaming completes, data in those ranges on
is still on
. That's by design, should the bootstrap process for the new node fail. Once things are stable, you can get rid of that data by running a nodetool cleanup
on the original three nodes.
What happens when a node goes down? Does C* rebalanced the ring by re-distributing the tokens and moving data around?
This happens when a nodetool removenode
is run. But for a node simply going down, "hints" are stored on the remaining nodes to be replayed once the down node comes back.
Upvotes: 5
Reputation: 979
I will try to explain in simple way
Cassandra provides a simple way for configuration, all the configuration is done in cassandra.yaml. You can also go through THIS to get a some picture of the partitioning in cluster.
Let's start with the basics, instead of using three nodes let's use only one node for now. With the default configuration of cassandra we get below values in cassandra.yaml file
num_tokens: 1
initial_token: 0
This means only one node and all the partitions will reside on this one node. Now, the concept of virtual node is, in simple terms cassandra divides the tokens into multiple ranges, even though there are no physical nodes. Now, how to enable the virtual nodes feature in configuration file cassandra.yaml. The answer is num_token value.
num_tokens: 128
#initial_token: 0
This configuration makes 128 token ranges, for example 0-10, 11-20, 20-30 and so on. Keep the value of initial_token commented, this means we want cassandra to decide value of initial token (One less thing to worry about).
Now lets add another node in to cluster. Below is the simple configuration of new node. Consider the first node IP as and second node IP as for simplicity.
num_tokens: 128
#initial_token: 0
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
- seeds: ","
We have just added a new node to our cluster, node1 will serve as seed node. The num_token value is 128, that means 128 token ranges. The value of initial_token is commented, that means cassandra will decide the initial token and range. Data transfer will start as soon as the new node joins cluster.
For third node, configuration shall be as below -
num_tokens: 128
#initial_token: 0
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
- seeds: ",,"
So third node will share the few token ranges from node1 and few token ranges from node2.
I hope, we got answers of question 1 and question 2 till now. Let's move to our next tow questions.
When a node goes down, hinted-handoff helps Cassandra maintain consistency . Any one out of remaining 2 nodes keeps the hints of the data which supposed to be written on the node which is down. Once the node goes up, these hints will be replayed and data will be written on target node. There is no need to do repartitioning or rebalancing kind of fancy things. Hints are stored in a directory which can be configured in cassandra.yaml file. By default 3 hours of hints will be stored, that means a defected node should come up within 3 hours. This value is also configurable in cassandra.yaml file.
hinted_handoff_enabled: true
max_hint_window_in_ms: 10800000 # 3 hours
hints_directory: /home/ubuntu/node1/data/hints
Murmur3Partitioner calculates the hash by using partition key columns, let's make our peace with that. There are other practitioners as well like RandomPartitioner and ByteOrderedPartitioner.
Below is the sample output of gossip info - You can go through each field in below protocol data
ubuntu@ds201-node1:~$ ./node1/bin/nodetool gossipinfo
generation:1621506507 -- the tiem at this node is boot strapped.
STATUS:28:NORMAL,-1316314773810616606 -----status of the node , NORMAL,LEFT,LEAVING,REMOVED,REMOVING.....
LOAD:2295:110299.0 -- Disk space usage
SCHEMA:64:c5c6bdbd-5916-347a-ab5b-21813ab9d135 -- Changes if schema changes
DC:37:Cassandra --- data center of the NODE
RACK:18:rack1 --- Rack of the within the datacenter
Gossip is the broadcast protocol spread data across the cluster. No one is a master in cassandra cluster, peers spread data among themselves which helps them to maintain latest information. Nodes communicates with each other randomly using gossip protocol (there is some criteria in this randomness). Gossip spreads node metadata only and not the client data.
Hope this clears some doubts.
Upvotes: 2