Reputation: 700
I have an Elasticsearch cluster of five nodes. I have the configuration on each set the same with 5 shards per index and 4 replicas. The idea is that every node will have every shard.
Four of my nodes have five shards. One node has ALL of them primary. One node has NOTHING. And then, of course, I have 5 unallocated shards.
I reload a new index every day and this is exactly how it allocates shards every time.
The goal here is to figure out why the one node gets nothing. That's bad.
It would be easy for me to ask why this is happening - and if anyone knows, that would be fantastic. But as I can't seem to find ANYTHING online or in the documentation to explain this, I must ask, perhaps, how I can diagnose it? Any clues? Anything I can look at to give a clue here?
EDIT TO ADD - here is my configuration. Every machine looks like this (with the exception of the machine name and the discovery, of course):
#
# Server-specific settings for cluster domainiq-es
#
cluster.name: domainiq-es
node.name: "Mesa-01"
discovery.zen.ping.unicast.hosts: ["m1plfinddev03.prod.mesa1.gdg", "m1plfinddev04.prod.mesa1.gdg", "p3plfinddev03.prod.phx3.gdg", "p3plfinddev04.prod.phx3.gdg"]
#
# The following configuration items should be the same for all ES servers
#
node.master: true
node.data: true
index.number_of_shards: 5
index.number_of_replicas: 4
index.store.type: mmapfs
index.memory.index_buffer_size: 30%
index.translog.flush_threshold_ops: 25000
index.refresh_interval: 30s
bootstrap.mlockall: true
gateway.recover_after_nodes: 4
gateway.recover_after_time: 2m
gateway.expected_nodes: 5
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.timeout: 10s
discovery.zen.ping.retries: 3
discovery.zen.ping.interval: 15s
discovery.zen.ping.multicast.enabled: false
index.search.slowlog.threshold.query.warn: 500ms
index.search.slowlog.threshold.query.info: 200ms
index.search.slowlog.threshold.query.debug: 199ms
index.search.slowlog.threshold.query.trace: 198ms
index.search.slowlog.threshold.fetch.warn: 1s
index.search.slowlog.threshold.fetch.info: 800ms
index.search.slowlog.threshold.fetch.debug: 500ms
index.search.slowlog.threshold.fetch.trace: 200ms
index.indexing.slowlog.threshold.index.warn: 10s
index.indexing.slowlog.threshold.index.info: 5s
index.indexing.slowlog.threshold.index.debug: 2s
index.indexing.slowlog.threshold.index.trace: 500ms
Upvotes: 1
Views: 2189
Reputation: 700
Thanks to Alcanzar, in the comment above, I do believe the issue here is one he saw - different versions. The node that will not accept shards is running one version earlier than the others.
I will upgrade everything to 1.4 this weekend and likely see this go away. Makes total sense now.
Upvotes: 1