Assuming a typical separation of concerns for a typical web-service: Multiple Client-API machines (Apache2 Web Server, Java/PHP, custom code) Cassandra Storage Cluster 5+ Client-API machines per Cassandra Node What are the high-availability features of Cassandra to ensure uptime for the Client-API (custom code)? Typical solutions would involve: an internal load-balancer with health monitoring (normal load-balancer HA applies here) or multiple backup node IPs configured in the client library to attempt connection to either at random or in sequence simulate the client library handling of this in application code (try multiple nodes until one connects) However I cannot seem to find much mention of this, including mention of "best practice" or "this is what I've done". To be specific I am currently learning about Cassandra and I am interested in introducing it to a Zend Framework (PHP) project and would like to know the best practice for High Availability connections to Cassandra from multiple Client-API machines. One-time failures can be managed but service downtime due to individual failed nodes is obviously not ideal.. Also, bonus points for explaining how Split-Brain is managed in Cassandra in a High Availability environment as described above.

phparchitecturecassandrahigh-availability

Drew Anderson

Reputation: 542

Apache Cassandra High Availability

Assuming a typical separation of concerns for a typical web-service:

Multiple Client-API machines (Apache2 Web Server, Java/PHP, custom code)
Cassandra Storage Cluster
5+ Client-API machines per Cassandra Node

What are the high-availability features of Cassandra to ensure uptime for the Client-API (custom code)?

Typical solutions would involve:

an internal load-balancer with health monitoring (normal load-balancer HA applies here)
or multiple backup node IPs configured in the client library to attempt connection to either at random or in sequence
simulate the client library handling of this in application code (try multiple nodes until one connects)

However I cannot seem to find much mention of this, including mention of "best practice" or "this is what I've done".

To be specific I am currently learning about Cassandra and I am interested in introducing it to a Zend Framework (PHP) project and would like to know the best practice for High Availability connections to Cassandra from multiple Client-API machines.

One-time failures can be managed but service downtime due to individual failed nodes is obviously not ideal..

Also, bonus points for explaining how Split-Brain is managed in Cassandra in a High Availability environment as described above.

Upvotes: 1

Answers (1)

Mata

Reputation: 439

Cassandra supports fault-tolerance / HA by design. To understand this, read hinted hand-off and message routing in Cassandra.

For split-brain handling, you may think of using Cages java lib for distributed synchronization functionality/locks etc.

From Cassandra - A Decentralized Structured Storage System:

Cassandra uses replication to achieve high availability and durability. Each data item is replicated at N hosts, where N is the replication factor configured \per-instance". Each key, k, is assigned to a coordinator node . The coordinator is in charge of the replication of the data items that fall within its range. In addition to locally storing each key within its range, the coordinator replicates these keys at the N-1 nodes in the ring. Cassandra provides the client with various options for how data needs to be replicated. Cassandra provides durability guarantees in the presence of node failures and network partitions by relaxing the quorum requirements.

If a client is making a connection to a random node in the cluster say node1, there can be below scenarios:

READ

[SUCCESS] node1 is UP and also has the requested data

[SUCCESS] node1 is UP and do not have the requested data so acts as a co-ordinator node and routes the request to the replica which has data say node2. Assume node2 is up and can serve the request

[NODE DOWN] node1 is DOWN and had the requested data. client receives UnAvailableException and can connect to other node in the cluster. if minimum replicas need to serve the query can respond, it will be a success.

[REPLICA NODE DOWN] node1 is up and do not have the requested data so acts as a co-ordinator node and routes the request to the replica which has data say node2. Assume node2 is DOWN and can not serve the request. If other replicas are not dead and can serve the request, it will be a success. If replica was up while request was sent but went down immediately after that TimedOutException is thrown. Client can connect to other node in the cluster.

WRITE

[SUCCESS] node1 is UP and also suppose to store the requested data

[SUCCESS] node1 is UP and do not have the requested data token range (no responsibility of storing this piece of data) so acts as a co-ordinator node and routes the request to the replica say node2 which is suppose to store the data. Assume node2 is up and can serve the request.

[NODE DOWN] node1 is DOWN and also suppose to store the requested data. client receives UnAvailableException and can connect to other node in the cluster. Since the primary node is down, hinted hand-off requests are stored in replicas.

[REPLICA NODE DOWN] node1 is UP and do not have the requested data token range (no responsibility of storing this piece of data) so acts as a co-ordinator node and routes the request to the replica say node2 which is suppose to store the data. Assume node2 is DOWN and can not serve the request. If other replicas are not dead and can serve the request, it will be a success. The hinted hands-off will be written on replica/coordinator node.

Upvotes: 1

Apache Cassandra High Availability

Answers (1)

Related Questions