caarlos0
caarlos0

Reputation: 20633

ElasticSearch cluster eventually starts returning wrong hits.total on search

We have a cluster running version 5.6.16. It has ~5.7k primary shards, ~2k indices and 28 nodes (3 masters, 3 coordinators and 22 data nodes):

{
  "cluster_name": "foo",
  "status": "green",
  "timed_out": false,
  "number_of_nodes": 28,
  "number_of_data_nodes": 22,
  "active_primary_shards": 5778,
  "active_shards": 11556,
  "relocating_shards": 0,
  "initializing_shards": 0,
  "unassigned_shards": 0,
  "delayed_unassigned_shards": 0,
  "number_of_pending_tasks": 0,
  "number_of_in_flight_fetch": 0,
  "task_max_waiting_in_queue_millis": 0,
  "active_shards_percent_as_number": 100
}

Eventually, whatever search we do in some indices return an insanely high doc count, even when no results are found.

For instance:

curl -s 'http://localhost:9200/*/_search?q=nope:thiswillneverexist&terminate_after=1' | jq -r '.'
{
  "took": 871,
  "timed_out": false,
  "terminated_early": false,
  "num_reduce_phases": 12,
  "_shards": {
    "total": 5778,
    "successful": 5778,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 9787770,
    "max_score": 2,
    "hits": []
  }
}

I couldn't find anything in the logs or any correlation with anything else, nor any other issues/foruns/etc (maybe I don't know what to search for exactly).

The only workaround we found so far is to restart the cluster.

Has anyone seen something like this? Any ideas on what I should investigate?

Upvotes: 3

Views: 688

Answers (1)

Amit
Amit

Reputation: 32376

This is really strange and in order to debug the issue, I would do the below:

  1. Why inner hits array is empty, elasticsearch by default returns 10 matching docs in the inner hits array which would include index-name and document-id which would be very helpful to see if that index, and doc contains the search term or not.
  2. Look at the elasticsearch query logs to see what is happening in both cases(when the issue happens vs after cluster restart logs)
  3. Is it intermittent or happens always and after how much time it again comes after cluster restart.

Upvotes: 2

Related Questions