Andrei Stalbe
Andrei Stalbe

Reputation: 1531

Elasticsearch slow response time

Elasticsearch cluster. Marvel dashboard

As you can see in the attached screenshot, our cluster has 11 nodes, however one of the slaves is always on red CPU 99%. To mention that this changes from one slave to another. At the same time, lately all query responses are really slow, (one simple query can take between 5 and 8 seconds). I dug tens of forums and resources, both, about Elasticsearch and Java and couldn't find any solutions or at least clues on how to solve this.

Any help and/or thoughts will be really appreciated. If there's a need for more info about servers, do not hesitate to ask and I'll provide updates.

Thank You.

Upvotes: 2

Views: 4037

Answers (1)

Nick
Nick

Reputation: 2613

It's difficult to answer this without going into a lot of details about your indexes and what kind of queries you're doing. I had a similar experience with fewer nodes but one that was always at max CPU. Here's what I learned in order of importance:

  1. Upgrade ES to its latest version (1.5 at time of writing). There are HUGE performance differences and improvements between 1.2 and 1.5.
  2. Make sure all your nodes are running the same version of ES and Oracle Java 8.
  3. Unless you use specific routing, a query will hit all nodes but the response will be prepared by one node. Depending on the amount of data being processed, this could explain a constant high CPU usage.
  4. Make sure your clients are not connecting to the same node every time. Implement a round-robbin on N nodes to distribute the query load.
  5. Optimize your queries. Use routing when possible, keep your indexes small and if possible, create them with logical breakdowns (ie/ timestamp, user data, client, category, etc.)
  6. Keep fielddata as small as possible and as optimized as possible. The higher the amount of fielddata necessary to execute a query, the more CPU your node will use and the slower your cluster gets. Check out doc_values.
  7. Do you really need 11 nodes? If your queries don't use routing, every query you do will hit each and every node. Each node in turn sends its answer to the processing node. This means a lot more work for the processing node since it now has to potentially combine data from 11 different nodes vs 2 or 3.
  8. With 11 nodes, you might also consider having one or two dedicated to processing queries (ie/ they don't store data) an the others dedicated to storing data.

The ElasticSearch team is making great progress with every release... so the first thing to do whenever possible is upgrade to the latest stable release. I went from 1.3 to 1.5 and many problems just disappeared :)

Upvotes: 2

Related Questions