Haytham
Haytham

Reputation: 175

Configuring cassandra-rackdc and cassandra-topology

I am configuring a 6-node Cassandra cluster on AWS EC2, 3 nodes in one region and 3 nodes in another region:

eu-central-1
- node0   cass-db-0   10.10.37.79   eu-central-1a
- node1   cass-db-1   10.10.38.229  eu-central-1b
- node2   cass-db-2   10.10.36.76   eu-central-1a

eu-west-1
- node3   cass-db-0   10.10.37.80   eu-west-1a
- node4   cass-db-1   10.10.39.177  eu-west-1b
- node5   cass-db-2   10.10.37.231  eu-west-1a

I have completed the local configuration in cassandra.yaml.

Now, I need to configure cassandra-rackdc.properties and cassandra-topology.properties but I don't understand the network topology.

Please advise.

Upvotes: 2

Views: 1115

Answers (2)

Aaron
Aaron

Reputation: 57748

Erick provides some great background here, which should be helpful for you. In terms of getting to a simple solution, I'd recommend this:

  • Make sure you're using the GossipingPropertyFileSnitch in the cassandra.yaml.
  • Delete cassandra-topology.properties.
  • Edit cassandra-rackdc.properties and set dc=eu-west-1 for the 3 the west nodes; likewise dc=eu-central-1 for the central nodes.
  • Leave the rack at the default, as you only have 3 nodes across 2 availability zones (AZs 1a and 1b).

If you were using AZs 1a, 1b, and 1c I'd say to use that for the rack property. Erick mentions defining your keyspaces with a RF of 3, which is solid advice. Typically, you'll want the number of AZs to match your RF for even data distribution and availability, which is why I'd recommend leaving rack at the default value for all.

Likewise, your keyspace definitions would look something like this:

CREATE KEYSPACE keyspace_name WITH REPLICATION = 
    {'class':'NetworkTopologyStrategy',
     'eu-west-1':'3',
     'eu-central-1':'3'};

The main point to consider, is that your data center names must match between the keyspace definition and the entries in the cassandra-rackdc.properties files.

Upvotes: 1

Erick Ramirez
Erick Ramirez

Reputation: 16303

When you are building a cluster, you would typically start with the network topology first. In your case, your choice of 2 regions indicates to me that you would like to have two logical Cassandra DCs each with 3 nodes.

Network topology

For best practice, we recommend configuring your keyspaces with a replication factor (RF) of 3 in each DC. This means that (a) there are 3 copies of the data, and (b) your cluster is configured for high availability.

With RF:3, it would require that you have the equivalent number of logical C* racks in each DC but in your case this is not possible because you only have 2 AZs so the topology design means that you will need to place all nodes in the one logical C* rack.

Snitches

A snitch determines which DCs and racks nodes belong to. There are several snitches to choose from and your choice of snitch will determine which .properties file to configure.

GossipingPropertyFileSnitch (GPFS) automatically updates all nodes using gossip. GPFS is recommended in all cases because it will future-proof your cluster. Unless you have C* expertise and have strong preference for other snitches, it is best practice to stick with GPFS. When using GPFS, you will need to define the node's DC and rack in the cassandra-rackdc.properties file. For details, see GossipingPropertyFileSnitch.

PropertyFileSnitch (PFS) is the precursor to GPFS determines the network topology based on what you've configured in the cassandra-topology.properties file. With PFS, each node has a full list of all nodes in the cluster so when you add/removed nodes, you have to update the cassandra-topology.properties file on every single node (details here). This is tedious which is why users prefer GPFS.

WARNING: If you are not using PropertyFileSnitch, we recommend that you delete the cassandra-topology.properties file on every single node because it's been known to cause intermittent gossip issues as I've documented here -- https://community.datastax.com/questions/4621/.

There are other snitches available (see the docs here) but I won't go through it here since we think GPFS is the right choice in all cases. Cheers!

Upvotes: 2

Related Questions