Background With our Elasticsearch nodes, I've noticed very high CPU usage per I/O throughput when indexing documents (queries seem to be ok). I was able to increase throughput via vertical scaling (adding more CPUs to the servers) but I wanted to see what kind of increase I would get by horizontal scaling (doubling the number of nodes from 2 to 4). Problem I expected to see increased throughput with the expanded cluster size but the performance was actually a little worse. I also noticed that half of the nodes reported very little I/O and CPU usage. Research I saw that the primary shard distribution was wonky so I shuffled some of them around using the re-route API. This didn't really have any effect other than to change which two nodes were being used. The _search_shards API indicates that all nodes and shards should participate. Question I'm not sure why only two nodes are participating in indexing. Once a document has been indexed, is there a way to see which shard it resides in? Is there something obvious that I'm missing? Setup Servers: 2 CPU, 10g JVM, 18G RAM, 500G SSD Index: 8 shards, 1 replica Routing Key: _id Total Document Count: 4.1M Index Document Count: 50k Avg Document Size: 14.6K Max Document Size: 32.4M Stats Shards files-v2 4 r STARTED 664644 8.4gb 10.240.219.136 es-qa-03 files-v2 4 p STARTED 664644 8.4gb 10.240.211.15 es-qa-01 files-v2 7 r STARTED 854807 10.5gb 10.240.53.190 es-qa-04 files-v2 7 p STARTED 854807 10.2gb 10.240.147.89 es-qa-02 files-v2 0 r STARTED 147515 711.4mb 10.240.53.190 es-qa-04 files-v2 0 p STARTED 147515 711.4mb 10.240.211.15 es-qa-01 files-v2 3 r STARTED 347552 1.2gb 10.240.53.190 es-qa-04 files-v2 3 p STARTED 347552 1.2gb 10.240.147.89 es-qa-02 files-v2 1 p STARTED 649461 3.5gb 10.240.219.136 es-qa-03 files-v2 1 r STARTED 649461 3.5gb 10.240.147.89 es-qa-02 files-v2 5 r STARTED 488581 3.6gb 10.240.219.136 es-qa-03 files-v2 5 p STARTED 488581 3.6gb 10.240.211.15 es-qa-01 files-v2 6 r STARTED 186067 916.8mb 10.240.147.89 es-qa-02 files-v2 6 p STARTED 186067 916.8mb 10.240.211.15 es-qa-01 files-v2 2 r STARTED 765970 7.8gb 10.240.53.190 es-qa-04 files-v2 2 p STARTED 765970 7.8gb 10.240.219.136 es-qa-03

Reputation: 726

Elasticsearch nodes not participating in indexing

Background

With our Elasticsearch nodes, I've noticed very high CPU usage per I/O throughput when indexing documents (queries seem to be ok). I was able to increase throughput via vertical scaling (adding more CPUs to the servers) but I wanted to see what kind of increase I would get by horizontal scaling (doubling the number of nodes from 2 to 4).

Problem

I expected to see increased throughput with the expanded cluster size but the performance was actually a little worse. I also noticed that half of the nodes reported very little I/O and CPU usage.

Research

I saw that the primary shard distribution was wonky so I shuffled some of them around using the re-route API. This didn't really have any effect other than to change which two nodes were being used.

The _search_shards API indicates that all nodes and shards should participate.

Question

I'm not sure why only two nodes are participating in indexing. Once a document has been indexed, is there a way to see which shard it resides in? Is there something obvious that I'm missing?

Setup

Servers: 2 CPU, 10g JVM, 18G RAM, 500G SSD
Index: 8 shards, 1 replica
Routing Key: _id
Total Document Count: 4.1M
Index Document Count: 50k
Avg Document Size: 14.6K
Max Document Size: 32.4M

Stats

Shards

files-v2           4 r STARTED  664644   8.4gb 10.240.219.136 es-qa-03
files-v2           4 p STARTED  664644   8.4gb 10.240.211.15  es-qa-01
files-v2           7 r STARTED  854807  10.5gb 10.240.53.190  es-qa-04
files-v2           7 p STARTED  854807  10.2gb 10.240.147.89  es-qa-02
files-v2           0 r STARTED  147515 711.4mb 10.240.53.190  es-qa-04
files-v2           0 p STARTED  147515 711.4mb 10.240.211.15  es-qa-01
files-v2           3 r STARTED  347552   1.2gb 10.240.53.190  es-qa-04
files-v2           3 p STARTED  347552   1.2gb 10.240.147.89  es-qa-02
files-v2           1 p STARTED  649461   3.5gb 10.240.219.136 es-qa-03
files-v2           1 r STARTED  649461   3.5gb 10.240.147.89  es-qa-02
files-v2           5 r STARTED  488581   3.6gb 10.240.219.136 es-qa-03
files-v2           5 p STARTED  488581   3.6gb 10.240.211.15  es-qa-01
files-v2           6 r STARTED  186067 916.8mb 10.240.147.89  es-qa-02
files-v2           6 p STARTED  186067 916.8mb 10.240.211.15  es-qa-01
files-v2           2 r STARTED  765970   7.8gb 10.240.53.190  es-qa-04
files-v2           2 p STARTED  765970   7.8gb 10.240.219.136 es-qa-03

Upvotes: 0

Answers (2)

Mike Cantrell

Reputation: 726

OK, so I think I found it. I'm using Spring Data's Elasticsearch repository. Inside their save(doc) method, there's a call to refresh:

public <S extends T> S save(S entity) {
    Assert.notNull(entity, "Cannot save 'null' entity.");
    elasticsearchOperations.index(createIndexQuery(entity));
    elasticsearchOperations.refresh(entityInformation.getIndexName(), true);
    return entity;
}

I bypassed this by invoking the API without Spring's abstraction and the CPU usage for all nodes was much, much better. I'm still not quite clear why a refresh would have effect on 2 nodes (instead of 1 or all) but the issue appears to be resolved.

Upvotes: 0

nevsv

Reputation: 2466

Make sure that JVM + Elastic configurations are same on all nodes.
For testing purpose - try to make all nodes to hold all data (in your case set number of replicas to 3).
About document-shard relation: https://www.elastic.co/guide/en/elasticsearch/guide/current/routing-value.html

Upvotes: 0

Elasticsearch nodes not participating in indexing

Answers (2)

Related Questions