Reputation: 75
I understood that Cassandra tries to replicate data across different rack - different DC to handle fail over... For Example : Suppose I have a total 8 node cluster spanning across 2 different DCs, each DC having 2 racks.
Node 1 - DC1 RACK 1 | Node 2 - DC1 RACK 1 | Node 3 - DC1 RACK 2 | Node 4 - DC1 RACK 2 | Node 5 - DC2 RACK 1 | Node 6 - DC2 RACK 1 | Node 7 - DC2 RACK 2 | Node 8 - DC2 RACK 2
Now if I have a RF of 3, then while writing a row , Cassandra will store 1st copy of the row on the node(let say in this case Node 1) which is responsible for the token range of row 1. So 1st replica is stored in let say : Node 1 which is under DC 1,RACK 1 Now Cassandra needs to store 2 more replicas to fulfill RF =3 critria.
1) Let say to handle Rack failure & to have local read it store 2nd replica in some node in the same DC but different RACK. So 2nd replica will be stored in either Node 3/Node 4 which are in DC 1,RACK 2. My Question is what basis will Cassandra choose Node 3 or Node 4. How it figures out why Node 3 gets prefreed over node 4 or vice versa.
2)To handle DC failure, it store 3rd copy in other DC (which is DC2). Now it is 2 options: a) It has to choose one of the racks between Rack 1 and Rack 2 in DC2. - my 2nd question is on what basis/logic does it choose among multiple Racks in the same data center. b) Lets imagine, if it chooses Rack 1 in DC 2, then it has to choose between Node 5 or Node 6 for the replication. - my 3rd question is on what basis Cassandra chooses the node with in Same Rack ? (It is basically same as 1st question)
Upvotes: 0
Views: 897
Reputation: 586
It depends on the replication strategy you choose for your keyspace.
In SimpleStrategy
it just takes consecutive nodes on the ring - this is meant as a simple strategy for a single DC (I'm guessing you don't use this but added for completeness)
in NetworkTopologyStrategy
you need to define the number of replications per DC and Cassandra will walk the ring clockwise until reaching the first node in another rack.
see: Cassandra data replication docs
and a blog with nice visuals
Regards,
Jony
Upvotes: 1