Mike Cantrell
Mike Cantrell

Reputation: 726

Elasticsearch nodes not participating in indexing

Background

With our Elasticsearch nodes, I've noticed very high CPU usage per I/O throughput when indexing documents (queries seem to be ok). I was able to increase throughput via vertical scaling (adding more CPUs to the servers) but I wanted to see what kind of increase I would get by horizontal scaling (doubling the number of nodes from 2 to 4).

Problem

I expected to see increased throughput with the expanded cluster size but the performance was actually a little worse. I also noticed that half of the nodes reported very little I/O and CPU usage.

Research

I saw that the primary shard distribution was wonky so I shuffled some of them around using the re-route API. This didn't really have any effect other than to change which two nodes were being used.

The _search_shards API indicates that all nodes and shards should participate.

Question

I'm not sure why only two nodes are participating in indexing. Once a document has been indexed, is there a way to see which shard it resides in? Is there something obvious that I'm missing?

Setup

Stats

OS Metrics

JVM Metrics

I/O Metrics

Shards

files-v2           4 r STARTED  664644   8.4gb 10.240.219.136 es-qa-03
files-v2           4 p STARTED  664644   8.4gb 10.240.211.15  es-qa-01
files-v2           7 r STARTED  854807  10.5gb 10.240.53.190  es-qa-04
files-v2           7 p STARTED  854807  10.2gb 10.240.147.89  es-qa-02
files-v2           0 r STARTED  147515 711.4mb 10.240.53.190  es-qa-04
files-v2           0 p STARTED  147515 711.4mb 10.240.211.15  es-qa-01
files-v2           3 r STARTED  347552   1.2gb 10.240.53.190  es-qa-04
files-v2           3 p STARTED  347552   1.2gb 10.240.147.89  es-qa-02
files-v2           1 p STARTED  649461   3.5gb 10.240.219.136 es-qa-03
files-v2           1 r STARTED  649461   3.5gb 10.240.147.89  es-qa-02
files-v2           5 r STARTED  488581   3.6gb 10.240.219.136 es-qa-03
files-v2           5 p STARTED  488581   3.6gb 10.240.211.15  es-qa-01
files-v2           6 r STARTED  186067 916.8mb 10.240.147.89  es-qa-02
files-v2           6 p STARTED  186067 916.8mb 10.240.211.15  es-qa-01
files-v2           2 r STARTED  765970   7.8gb 10.240.53.190  es-qa-04
files-v2           2 p STARTED  765970   7.8gb 10.240.219.136 es-qa-03

Upvotes: 0

Views: 196

Answers (2)

Mike Cantrell
Mike Cantrell

Reputation: 726

OK, so I think I found it. I'm using Spring Data's Elasticsearch repository. Inside their save(doc) method, there's a call to refresh:

public <S extends T> S save(S entity) {
    Assert.notNull(entity, "Cannot save 'null' entity.");
    elasticsearchOperations.index(createIndexQuery(entity));
    elasticsearchOperations.refresh(entityInformation.getIndexName(), true);
    return entity;
}

I bypassed this by invoking the API without Spring's abstraction and the CPU usage for all nodes was much, much better. I'm still not quite clear why a refresh would have effect on 2 nodes (instead of 1 or all) but the issue appears to be resolved.

Upvotes: 0

nevsv
nevsv

Reputation: 2466

  1. Make sure that JVM + Elastic configurations are same on all nodes.
  2. For testing purpose - try to make all nodes to hold all data (in your case set number of replicas to 3).
  3. About document-shard relation: https://www.elastic.co/guide/en/elasticsearch/guide/current/routing-value.html

Upvotes: 0

Related Questions