Reputation: 1388
To all Cassandra experts,
I am trying to understand cassandra failure detection and recovery. I am a little bit confused on how this exactly works.
From Datastax Doc:
Configuring the phi_convict_threshold property adjusts the sensitivity of the failure detector. Lower values increase the likelihood that an unresponsive node will be marked as down, while higher values decrease the likelihood that transient failures causing node failure. In unstable network environments (such as EC2 at times), raising the value to 10 or 12 helps prevent false failures.
From http://ljungblad.nu/post/44006928392/cassandra-and-its-accrual-failure-detector
Phi represents the likelihood that Node A is wrong about Node B’s state.The higher the Phi, the bigger the confidence that Node B has failed.
Can someone explain me in details C* failure detection mechanism and how C* recovers it in different scenarios.
Thanks in advance
Chaity
Upvotes: 4
Views: 2198
Reputation: 3760
I don't consider myself a Cassandra expert, but here is my take on Cassandra's node failure detection :
All of these communication methods work together when nodes go offline or are performing poorly, and can be configured. As far as I know, Cassandra will not bring nodes back to life after failure; this requires human intervention to bring the node back online and run nodetool to repair the data on the failed node.
Depending on your organization's failure tolerance for read and write operations, you can always configure the consistency level.
Some resources for managing node failure:
Upvotes: 3