Reputation: 793
I'm setting up an elasticsearch (5.0.1) cluster.
It has three master-eligible nodes :
el-m01
el-m02
el-m03
The cluster fails to assemble, and Every master node gets the following NotMasterException
exception in the logs :
[2016-11-21T15:24:13,274][INFO ][o.e.d.z.ZenDiscovery ] [el-m01] failed to send join request to master [{el-m02}{bBhsu3fJSj-MyiWJGhQmog}{_IzdeUd4Sv6g-rhemGjEVQ}{192.168.110.118}{192.168.110.118:9300}{rack=r1}], reason [RemoteTransportException[[el-m02][192.168.110.118:9300][internal:discovery/zen/join]]; nested: NotMasterException[Node [{el-m02}{bBhsu3fJSj-MyiWJGhQmog}{_IzdeUd4Sv6g-rhemGjEVQ}{192.168.110.118}{192.168.110.118:9300}{rack=r1}] not master for join request]; ], tried [3] times
Enabling the debugging logs allowed me to understand the following :
The master election is happening, and is a success. However, while every node has chosen a master, no nodes thinks he is the master. i.e. :
What is happening here?
Upvotes: 5
Views: 4300
Reputation: 584
The Elasticsearch data directory $ES_HOME/data
, or in the case of RPM, e.g., /var/lib/elasticsearch
contains a randomly generated node ID when Elasticsearch is first started. If this directory is copied to multiple instances that are expected to form a cluster, the following error should be received:
failed to send join request to master [..] IllegalArgumentException [..] found existing node [..] with the same id but is a different node instance
However, when minimum_master_nodes
is not met, an error less indicative of the problem is received:
failed to send join request to master [..] NotMasterException [..] not master for join request
Github: https://github.com/elastic/elasticsearch/issues/32904
The issue can be resolved by deleting the contents of the data directory, and data directories shouldn't be copied in the first place.
Upvotes: 1
Reputation: 793
Here is the situation : By cloning a VM to get all the masters, every node has the same node id.
This can be verified with the following command, listing all nodes ids :
GET /_cat/nodes?v&h=id,ip,name&full_id=true
Note that since your cluster hasn't formed, each node needs to be queried individually, i.e :
curl 192.168.110.111:9200/_cat/nodes?v&h=id,ip,name&full_id=true
curl 192.168.110.112:9200/_cat/nodes?v&h=id,ip,name&full_id=true
(...)
This is bad. the node ids need to be unique.
To solve this situation, you need to delete the indices (in /var/lib/elasticsearch
) on every node. This will delete all data in elasticsearch, and will also reset the node ids.
To avoid having this problem in the first place, you can :
Upvotes: 19