Reputation: 3214
Background:
We are creating a Cassandra Cluster that spans 3 geographically separated Datacenters. We plan to have 2 Cassandra nodes in each data center (2 nodes x 3 sites = 6 nodes total). All the 6 nodes will be the part of same cluster.
The idea is to be able to write data to any node in the cluster and be able to read it from any other node. [We can tolerate 1 second delay in updates].
The Question:
How do we design a client to write to the "cluster". Cassandra does not have a router or a middle-layer like MongoDB. Do we design so that we write to any node in the ring? If so, what is that node is down (i.e. do we need to make our client aware of all the node IPs in the cluster?)
Thank you.
Upvotes: 2
Views: 1089
Reputation: 6932
You can read or write from any node in the cluster, they are all capable of routing the requests to the correct nodes (the node performing the routing is typically referred to as the "coordinator" for an operation). You should try to balance your requests over all nodes in the local datacenter, only using nodes in the remote datacenters if all local nodes are down. Most Cassandra clients will spread requests in a round robin fashion across all of the nodes that you point them at, and as Canausa mentions, some autodiscover other nodes and sometimes use more sophisticated algorithms for picking which node to send a request to.
Writes to any datacenter are automatically replicated to all other datacenters, so you can indeed write to any node and read from any node. Typically, you will want to use consistency level LOCAL_QUORUM for reads and writes, which requires that a quorum of replicas in the local data center respond for the operation to be considered a success. You can also consider writing at EACH_QUORUM, which waits for a response from a quorum of replicas in each datacenter. Obviously, the latency will be much higher in this case, but you can achieve strong consistency across all datacenters.
However, with only 2 nodes in each datacenter, a quorum of replicas is equivalent to all replicas, so if any node goes down, you will lose availability for that portion of the data. For this reason, if you want to use quorum consistency levels, it's recommended that you have a replication factor of at least 3 in each datacenter, allowing for the loss one replica while still maintaining strong (or locally strong) consistency.
Upvotes: 3
Reputation: 11
Most Cassandra clients will attempt to connect to 1 node and if it fails will try the next one in the list. Some will even make a connection and figure out the layout of the cluster and pick the right node to connect to. Clients like Hector will keep connects open to particular nodes in case you ever need to request more information, this will avoid the expensive setup.
Upvotes: 1
Reputation: 10250
Any Cassandra client can send a request to any node in the cluster, and that node will route and consolidate the results from both itself and other nodes.
In your scenario, you might consider a replication factor of 3, one per data center, each node partitioning half of the data, and use QUORUM for both read and write. Configure your replica placement strategy so that each write updates one node from each data center (e.g. A1, A2, A3, B1, B2, B3). That way, each data center has all the data at its disposal, and a read or write request will return control as soon as it receives a response from the first responding data center in addition to its own.
You do not need to make your client aware of all nodes, but it should be aware of at least two to prevent a single point of failure. I'd go with at least one from each data center.
Upvotes: 0