Reputation: 2721
I'm currently upgrading a cluster to 6.1 and have been unable to get the nodes to discover each other at startup. The three individual nodes boot but then get stuck in a loop:
[2018-01-08T11:33:01,421][WARN ][o.e.d.z.ZenDiscovery ] [ip-10-xxx-xxx-xxx] not enough master nodes discovered during pinging (found [[Candidate{node={ip-10-xxx-xxx-xxx}{gMlxxxxxRW-74axxxQ8V-3x}{6gBBYZxxxxxxxon=-1}]], but needed [2]), pinging again
The relevant part of my configuration is:
# Use the AWS private IP as self identifier
http.host: _ec2:privateIp_
network.host: _ec2:privateIp_
http.bind_host: 0.0.0.0
network.bind_host: 0.0.0.0
discovery.zen.hosts_provider: ec2
# These are expanded in my CloudFormation template
discovery.ec2.tag.Stack: @@STACK
discovery.ec2.tag.App: @@APP
discovery.ec2.tag.Stage: @@STAGE
Switching on debug for discovery (using logger.org.elasticsearch.discovery.ec2: "TRACE"
) nets me some evidence that the discovery process is failing:
[2018-01-08T11:32:58,419][TRACE][o.e.d.e.AwsEc2UnicastHostsProvider] [ip-10-xxx-xxx-xxx] building dynamic unicast discovery nodes...
[2018-01-08T11:32:58,420][DEBUG][o.e.d.e.AwsEc2UnicastHostsProvider] [ip-10-xxx-xxx-xxx] using dynamic discovery nodes []
Upvotes: 3
Views: 707
Reputation: 2721
After further debugging I discovered that the documentation is not correct.
The documentation for the endpoint setting says: "The ec2 service endpoint to connect to. This will be automatically figured out by the ec2 client based on the instance location, but can be specified explicitly."
Unfortunately this isn't true and there is an open issue at https://github.com/elastic/elasticsearch/issues/27464.
When troubleshooting further I switched on AWS logging using logger.com.amazonaws.request: "DEBUG"
in my elasticsearch config. This provided an entry in the log file stating that it was contacting us-east-1 despite the instance being in eu-west-1:
[2018-01-08T12:26:40,029][DEBUG][c.a.request ] Sending Request: POST https://ec2.us-east-1.amazonaws.com / Parameters: ({"Action":["DescribeInstances"],"Version":["2016-11-15"] ...<snip>
It looks like they are aware and likely to fix it to make the behaviour of the plugin match the documentation (see https://github.com/elastic/elasticsearch/issues/27924) but in the meantime the fix is to explicitly set the endpoint using something like:
discovery.ec2.endpoint: ec2.eu-west-1.amazonaws.com
Upvotes: 5