LaurentD
LaurentD

Reputation: 21

Cassandra V3 : repair thread hangs when running on multiple nodes

Tech info : Cassandra version : 3.11.4. 2 datacenter, 54 nodes each, with 2 To on disk. RF : 3 for all keyspaces.

Hello, everyone,

I need some help on a puzzling repair issue on a Cassandra cluster, which seems common but that I don't understand :

[2021-12-20 23:07:14,358] Repair session 62e5fee1-6179-11ec-9687-81fe085aa34b for range [(3995347772210991689,4008580965241951449]] failed with error [repair #62e5fee1-6179-11ec-9687-81fe085aa34b on keyspace1/cf1, [(3995347772210991689,4008580965241951449]]] Validation failed in /xx.xx.xxx.52 (progress: 18%)

So, what are we missing here ? Is it possible to repair every node in a DC like we do or are we just fundamentally wrong ?

Does anyone manage to run a repair correctly ?

Any help will be greatly appreciated , as we don't have a clue about how to deal with this issue...

Note : We found many questions about this (example : Simultaneous repairs cause repair to hang) on StackOverflow). One answer points out that there could only be one repaired node at a time, which seems confusing and very inconvenient for a large cluster.     Another redirects to a bug but it's not our version (https://issues.apache.org/jira/browse/CASSANDRA-11824)

Can someone share his experience ? Or redirects us to the proper documentation page ? That would be nice.

L.

PS : Excuse my english, it's not my native language. 

Upvotes: 2

Views: 623

Answers (1)

Erick Ramirez
Erick Ramirez

Reputation: 16313

If you think about repairs in Cassandra, its goal is to synchronise the data between replicas (nodes) so if you have a replication factor of 3 in a C* data centre then running a repair on a node will also cause repair operations to run on the replica nodes.

If you run repairs on each of those replica nodes (in parallel) then you end up with multiple repair operations synchronising (fixing inconsistencies) over the same set of data. Those repairs are all competing for the same resources on each of the nodes and that is why you run into issues.

We recommend that you perform a rolling repair instead, one node at a time until all nodes are repaired. You only need to run repair once every gc_grace_seconds which by default is 10 days so we suggest scheduling repairs once a week.

For example, if you have a 6-node cluster then schedule repairs on node 1 on Mondays, node 2 on Tuesdays and so on. For larger clusters, you have to do a bit more work and only run repairs in parallel on nodes which are not "adjacent" to each other in the ring so they don't have overlapping token ranges. This is a bit difficult to understand and therefore manage so we recommend that you use automated tools like Reaper. Cheers!

Upvotes: 3

Related Questions