Bing Wu
Bing Wu

Reputation: 348

Cassandra repair: why do I have to run -pr on every host in the cluster?

According to all the sources I read on the topic of Cassandra Anti-entropy manual repairs, such as this one, if I were to use "nodetool" partitioner range (or -pr) option, I need to run it on all hosts in the entire cluster:

Note: If you use this option, you must run nodetool repair -pr on every node in the cluster to repair all data. Otherwise, some ranges of data will not be repaired.

Unless the above paragraph is meant for using -pr to repair the entire cluster, it doesn't make sense to me. Because when nodetool runs without this option, it fixes all ranges on the said node: if RF=3, then the range includes the primary one, plus the secondary and the tertiary ones from neighboring nodes. Whereas with the -pr option, it fixes only the primary copy.

So if there is a ring that has nodes A, B, C, D, E, F, if my goal is to repair C, wouldn't it be the same when I run "nodetool" on C only OR when I run "nodetool -pr" on A, B, and C?

Upvotes: 0

Views: 132

Answers (1)

Alec  Collier
Alec Collier

Reputation: 1523

My understanding is, if you run nodetool repair -pr on node C, then the ranges that will be repaired are only the ones where C is the primary node. So not the ones where C is secondary/tertiary.

So the comment is saying that if you want to repair the entire cluster, then you will need to run it on every node. That's most people's intention, where they want the entire cluster to be continuously repaired, and they won't think about the individual nodes.

Upvotes: 1

Related Questions