Cassandra newbie question regarding clusters

Question

Reading up on Cassandra for a POC. Will be hitting it with Java / Spring. One thing isn't clear at this point. It's a peer to peer architecture. So lets say I have 3 nodes. 1.1.1.1, 1.1.1.2 and 1.1.1.3. I get that Cassandra will distribute the data across all 3 node and do its replication thing, etc.

In Spring, for the Datastax Cassandra driver as well as cqlsh... since there is no leader in a data-center/cluster... In cqlsh they call it a server, in the Spring Datastax driver, they call it a contact point.

Do I put all 3 IPs as contact points? Or do I just pick one? If I have 10,000 containers and they all connect to .1, that'll probably kill that box, no? What if I have 1,000 nodes. I can't possibly have to put all 1000 as contact points?

Just trying to see how you are supposed to connect to the cluster. All the docs and tutorials seem to be aimed at a single server. You'd think this would be basic information lol...

Erick Ramirez · Accepted Answer

The quick answer is yes. You can pick one (any node will do since all Cassandra nodes are equal -- no master/slave, no primary/secondary), two or all three.

There isn't anything special about the contact points. The driver uses the contact points as a way of "contacting" the cluster, an entry point if you will, during the initial connection.

Contact points are addresses of nodes in the Cassandra cluster that a driver uses to discover the cluster topology during the initialisation phase. Once the driver done the initial connection, it will know about all the other nodes in the cluster including which racks and DCs they belong to (topology). Once connected, the driver will also be listening for topology changes, detect when nodes are added or decommissioned.

By this time, you would have already worked out that only one contact point is required since the driver will get the addresses of the other nodes once it is connected to the cluster. But general recommendation is to have at least 2 contact points such that if the first contact point is unavailable for whatever reason, the driver can contact another contact point.

To reiterate, the driver only uses the contact points during the initialisation phase then you start your application. It does not mean the driver will exclusively route all requests to those CPs only. The driver will load balance/route requests to all nodes in the cluster. Cheers!

Cassandra newbie question regarding clusters

Answers (1)

Related Questions