Reputation: 55
My Cassandra cluster consists of 2 DCs, each DC has 5 nodes and replication factor per DC is 3. Both DCs are hosted onto the same docker orchestrator. This is a legacy and probably it was done during last major system migration years ago. At the time being I don't see any advantage of having 2 DCs with same replication factor 3. This way same data is written 6 times. Cluster is at least 80% write heavy, reads are more or less limited.
Cassandra load is struggling at peak times, so I would like to have 1 DC with 10 nodes (instead of 2DCs x 5 ndoes) to be able to balance across 10 nodes, instead of just 5. This way I will also bring down data size per node. Having same amount of RAM and CPU dedicated to Cassandra, I would win performance and free storage space ;-)
So idea is to decommission DC2 and bring all 5 nodes from it to DC1 as brand new nodes. Steps are known:
So my questions are:
is there a way to alter keyspace to DC1 only, then just to bring all DC2 nodes down, erase volumes and add them one by one to DC1, expanding DC1? Basically to decommission all DC2 nodes at once?
Is there a way for even quicker move of those 5 DC2 nodes to DC1 (at the end they contain same data as 5 nodes in DC1)? Like just join them to DC1 with all data they contain?
What is the advantage of having 2 DCs in a single cluster, instead of having a single DC with double the nodes? Or it strongly depends on the usage and the way services write and read data from Cassandra?
Appreciate your replies, thanks.
Cheers, OvivO
Upvotes: 2
Views: 202
Reputation: 57798
is there a way to alter keyspace to DC1 only, then just to bring all DC2 nodes down, erase volumes and add them one by one to DC1, expanding DC1? Basically to decommission all DC2 nodes at once?
Yes, you can adjust the keyspace definition to just replicate within DC1. Since you're basically removing a DC, you could shut them all down, and run a nodetool removenode
for each. In theory, that would remove the nodes from gossip and (if they're down) not attempt to move data around. Then yes, add each node back to DC1, one at a time. Once you're done, run a repair, followed by a nodetool cleanup
on each node.
Is there a way for even quicker move of those 5 DC2 nodes to DC1 (at the end they contain same data as 5 nodes in DC1)? Like just join them to DC1 with all data they contain?
No. Token range assignment is DC dependent. If they moved to a new DC, their range assingments would change, and the nodes would very likely be responsible for different ranges of data.
What is the advantage of having 2 DCs in a single cluster, instead of having a single DC with double the nodes?
Geographic awareness. If you have a mobile app and users on both the West Coast and East Coast, you don't want your East Coast users making a call for data all the way to the West Coast. You want that data call to happen as locally as possible. So, you'd build up a DC on each coast, and let Cassandra keep them in-sync.
Upvotes: 1