Reputation: 175
I am configuring a 6-node Cassandra cluster on AWS EC2, 3 nodes in one region and 3 nodes in another region:
eu-central-1
- node0 cass-db-0 10.10.37.79 eu-central-1a
- node1 cass-db-1 10.10.38.229 eu-central-1b
- node2 cass-db-2 10.10.36.76 eu-central-1a
eu-west-1
- node3 cass-db-0 10.10.37.80 eu-west-1a
- node4 cass-db-1 10.10.39.177 eu-west-1b
- node5 cass-db-2 10.10.37.231 eu-west-1a
I have completed the local configuration in cassandra.yaml.
Now, I need to configure cassandra-rackdc.properties and cassandra-topology.properties but I don't understand the network topology.
Please advise.
Upvotes: 2
Views: 1115
Reputation: 57748
Erick provides some great background here, which should be helpful for you. In terms of getting to a simple solution, I'd recommend this:
GossipingPropertyFileSnitch
in the cassandra.yaml
.cassandra-topology.properties
.cassandra-rackdc.properties
and set dc=eu-west-1
for the 3 the west nodes; likewise dc=eu-central-1
for the central nodes.If you were using AZs 1a, 1b, and 1c I'd say to use that for the rack
property. Erick mentions defining your keyspaces with a RF of 3, which is solid advice. Typically, you'll want the number of AZs to match your RF for even data distribution and availability, which is why I'd recommend leaving rack
at the default value for all.
Likewise, your keyspace definitions would look something like this:
CREATE KEYSPACE keyspace_name WITH REPLICATION =
{'class':'NetworkTopologyStrategy',
'eu-west-1':'3',
'eu-central-1':'3'};
The main point to consider, is that your data center names must match between the keyspace definition and the entries in the cassandra-rackdc.properties
files.
Upvotes: 1
Reputation: 16303
When you are building a cluster, you would typically start with the network topology first. In your case, your choice of 2 regions indicates to me that you would like to have two logical Cassandra DCs each with 3 nodes.
For best practice, we recommend configuring your keyspaces with a replication factor (RF) of 3 in each DC. This means that (a) there are 3 copies of the data, and (b) your cluster is configured for high availability.
With RF:3
, it would require that you have the equivalent number of logical C* racks in each DC but in your case this is not possible because you only have 2 AZs so the topology design means that you will need to place all nodes in the one logical C* rack.
A snitch determines which DCs and racks nodes belong to. There are several snitches to choose from and your choice of snitch will determine which .properties
file to configure.
GossipingPropertyFileSnitch
(GPFS) automatically updates all nodes using gossip. GPFS is recommended in all cases because it will future-proof your cluster. Unless you have C* expertise and have strong preference for other snitches, it is best practice to stick with GPFS. When using GPFS, you will need to define the node's DC and rack in the cassandra-rackdc.properties
file. For details, see GossipingPropertyFileSnitch.
PropertyFileSnitch
(PFS) is the precursor to GPFS determines the network topology based on what you've configured in the cassandra-topology.properties
file. With PFS, each node has a full list of all nodes in the cluster so when you add/removed nodes, you have to update the cassandra-topology.properties
file on every single node (details here). This is tedious which is why users prefer GPFS.
WARNING: If you are not using PropertyFileSnitch
, we recommend that you delete the cassandra-topology.properties
file on every single node because it's been known to cause intermittent gossip issues as I've documented here -- https://community.datastax.com/questions/4621/.
There are other snitches available (see the docs here) but I won't go through it here since we think GPFS is the right choice in all cases. Cheers!
Upvotes: 2