hammady
hammady

Reputation: 988

How to force a leader on SolrCloud?

I have a 5-node SolrCloud (Solr 7.0) with an external 3-node Zookeeper ensemble. There is one collection called "production" that is sharded to 5 shards with a replication factor of 5. See the screenshot below:

enter image description here

shard5 was struggling to elect a new leader for a long time and other cores were complaining with the following error:

azsolr1 solr: 2018-08-28 19:32:43.575 ERROR (qtp1124317168-9304) [c:production s:shard2 r:core_node9 x:production_shard2_replica_n4] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: No registered leader was found after waiting for 4000ms , collection: production slice: shard5

After restarting all nodes one by one (I even restarted the zookeeper nodes), I had no luck in electing the only active replica (azsolr1) as the leader. I then unloaded the 4 replicas with the 'down' state using the CoreAdmin API UNLOAD command which caused the replicas to disappear completely.

With that setup, trying to force the leader of the shard using the Collection API FORCELEADER does nothing. I also tried this before unloading the cores.

Here is the current status:

enter image description here

Why can't Solr just elect the only active replica for shard 5 as the leader? Isn't this obvious, especially after forcing the leader on the shard?

Assuming the leader was elected successfully somehow, do I recreate the replicas that I deleted using the Collection API ADDREPLICA? In this case, should I reuse the same instanceDir and dataDir of the deleted replicas? Or I just let it replicate from scratch?

Upvotes: 5

Views: 9348

Answers (2)

Alihossein shahabi
Alihossein shahabi

Reputation: 4352

I had the same problem.

one collection with 3 replicas (solr1 --> was a leader before, solr2, solr3). one of the shards has no leader! and I did these steps :

1 - stop solr2 and solr3

2- call FORCE LEADER API (http://xx.xx.xxx.xx:8983/solr/admin/collections?action=FORCELEADER&collection=your_collection_name&shard=shard1)

3 - after a few minutes solr1 elected as a leader

Upvotes: 3

hammady
hammady

Reputation: 988

Restarting azsolr1 which was hosting the only replica for shard5 forced the election of the leader. Sounds crazy, but that was it. After doing that, I added the other 4 replicas using the ADDREPLICA command.

enter image description here

Upvotes: 3

Related Questions