Reputation: 726
Background
With our Elasticsearch nodes, I've noticed very high CPU usage per I/O throughput when indexing documents (queries seem to be ok). I was able to increase throughput via vertical scaling (adding more CPUs to the servers) but I wanted to see what kind of increase I would get by horizontal scaling (doubling the number of nodes from 2 to 4).
Problem
I expected to see increased throughput with the expanded cluster size but the performance was actually a little worse. I also noticed that half of the nodes reported very little I/O and CPU usage.
Research
I saw that the primary shard distribution was wonky so I shuffled some of them around using the re-route API. This didn't really have any effect other than to change which two nodes were being used.
The _search_shards API indicates that all nodes and shards should participate.
Question
I'm not sure why only two nodes are participating in indexing. Once a document has been indexed, is there a way to see which shard it resides in? Is there something obvious that I'm missing?
Setup
Stats
Shards
files-v2 4 r STARTED 664644 8.4gb 10.240.219.136 es-qa-03
files-v2 4 p STARTED 664644 8.4gb 10.240.211.15 es-qa-01
files-v2 7 r STARTED 854807 10.5gb 10.240.53.190 es-qa-04
files-v2 7 p STARTED 854807 10.2gb 10.240.147.89 es-qa-02
files-v2 0 r STARTED 147515 711.4mb 10.240.53.190 es-qa-04
files-v2 0 p STARTED 147515 711.4mb 10.240.211.15 es-qa-01
files-v2 3 r STARTED 347552 1.2gb 10.240.53.190 es-qa-04
files-v2 3 p STARTED 347552 1.2gb 10.240.147.89 es-qa-02
files-v2 1 p STARTED 649461 3.5gb 10.240.219.136 es-qa-03
files-v2 1 r STARTED 649461 3.5gb 10.240.147.89 es-qa-02
files-v2 5 r STARTED 488581 3.6gb 10.240.219.136 es-qa-03
files-v2 5 p STARTED 488581 3.6gb 10.240.211.15 es-qa-01
files-v2 6 r STARTED 186067 916.8mb 10.240.147.89 es-qa-02
files-v2 6 p STARTED 186067 916.8mb 10.240.211.15 es-qa-01
files-v2 2 r STARTED 765970 7.8gb 10.240.53.190 es-qa-04
files-v2 2 p STARTED 765970 7.8gb 10.240.219.136 es-qa-03
Upvotes: 0
Views: 196
Reputation: 726
OK, so I think I found it. I'm using Spring Data's Elasticsearch repository. Inside their save(doc) method, there's a call to refresh:
public <S extends T> S save(S entity) {
Assert.notNull(entity, "Cannot save 'null' entity.");
elasticsearchOperations.index(createIndexQuery(entity));
elasticsearchOperations.refresh(entityInformation.getIndexName(), true);
return entity;
}
I bypassed this by invoking the API without Spring's abstraction and the CPU usage for all nodes was much, much better. I'm still not quite clear why a refresh would have effect on 2 nodes (instead of 1 or all) but the issue appears to be resolved.
Upvotes: 0
Reputation: 2466
Upvotes: 0