Nodes won't join cluster : NotMasterException (Weird master election bug)

Question

I'm setting up an elasticsearch (5.0.1) cluster.

It has three master-eligible nodes :

el-m01
el-m02
el-m03

The cluster fails to assemble, and Every master node gets the following NotMasterException exception in the logs :

[2016-11-21T15:24:13,274][INFO ][o.e.d.z.ZenDiscovery     ] [el-m01] failed to send join request to master [{el-m02}{bBhsu3fJSj-MyiWJGhQmog}{_IzdeUd4Sv6g-rhemGjEVQ}{192.168.110.118}{192.168.110.118:9300}{rack=r1}], reason [RemoteTransportException[[el-m02][192.168.110.118:9300][internal:discovery/zen/join]]; nested: NotMasterException[Node [{el-m02}{bBhsu3fJSj-MyiWJGhQmog}{_IzdeUd4Sv6g-rhemGjEVQ}{192.168.110.118}{192.168.110.118:9300}{rack=r1}] not master for join request]; ], tried [3] times

Enabling the debugging logs allowed me to understand the following :

The master election is happening, and is a success. However, while every node has chosen a master, no nodes thinks he is the master. i.e. :

el-m01 thinks el-m02 is the master
el-m02 thinks el-m03 is the master
el-m03 thinks el-m01 is the master

What is happening here?

A-y · Accepted Answer

Here is the situation : By cloning a VM to get all the masters, every node has the same node id.

This can be verified with the following command, listing all nodes ids :

GET /_cat/nodes?v&h=id,ip,name&full_id=true

Note that since your cluster hasn't formed, each node needs to be queried individually, i.e :

curl 192.168.110.111:9200/_cat/nodes?v&h=id,ip,name&full_id=true
curl 192.168.110.112:9200/_cat/nodes?v&h=id,ip,name&full_id=true
(...)

This is bad. the node ids need to be unique.

To solve this situation, you need to delete the indices (in /var/lib/elasticsearch) on every node. This will delete all data in elasticsearch, and will also reset the node ids.

To avoid having this problem in the first place, you can :

A. install elasticsearch after having cloned the VMs
B. use an automated tool like ansible or puppet to manage elasticsearch.

Nodes won't join cluster : NotMasterException (Weird master election bug)

Answers (2)

Related Questions

Nodes won&#39;t join cluster : NotMasterException (Weird master election bug)

Answers (2)

Related Questions

Nodes won't join cluster : NotMasterException (Weird master election bug)