Elasticsearch search query returns different amount of documents

Question

Some background on the elasticsearch instance:

One node, on one machine
The particular index consists of 2.6 billion documents with a size of 1.23TB
The index is divided on 4 shards.
Heap size is set to 30 GB
The server has 256GB of ram and 40 cores.
Elasticsearch (version 1.4.3) is the only thing runnning on this server

I want to return all documents that has a specific name. The attribute name is mapped:

"name": {
                    "type": "string",
                    "index": "not_analyzed"
                }

I have tried using different type of search; filter, query_string, term. All with the same result. The current query looks like this:

    {   "query": {
            "query_string": {
                "default_field" : "name",
                "query": "test_run_435_tc"
            }
        },
        "size" : 10000000
    }

The problem is that the query does not return the right amount of documents at the first try. I know for a fact that there exists about 45000 documents with the name "test_run_435_tc" in the index.

But when the query is run for the first time it returns around 5000 documents. If I repeat the query directly after each other, the number of returned documents are increasing. After about 3-4 queries run, I get the right amount of documents in the result.

I am using elasticsearch-py as client.

It seems like elasticsearch is warming up and after a few runs of the same query, elastic returns the correct amount of documents..

Why is elasticsearch behaving like this? It is a normal behaviour for elasticsearch or am I missing something? Of course I would like to get the correct result on the first try..

Updates based on comments:

The "size" : 10000000 originates from when I was not aware of how many documents with the same name that were in the index.

When setting "size" : 0 and executing the query, this is the response:

 {u'_shards': {u'failed': 0, u'successful': 4, u'total': 4},
  u'hits': {u'hits': [], u'max_score': 0.0, u'total': 28754},
  u'timed_out': True,
  u'took': 130}

When runnning the same query again with "size" : 0, this is the response:

 {u'_shards': {u'failed': 0, u'successful': 4, u'total': 4},
  u'hits': {u'hits': [], u'max_score': 0.0, u'total': 39223},
  u'timed_out': True,
  u'took': 134}

Running the same query as above with "size": 0, but with the these parameters .....?timeout=100000&search_type=count returns this response:

{
"took": 525,
"timed_out": false,
"_shards": {
    "total": 4,
    "successful": 4,
    "failed": 0
},
"hits": {
    "total": 49501,
    "max_score": 0,
    "hits": []
}
}

The response above which returned 49501 "hits_total", actually gives the correct number of hits in the first try!

Elasticsearch search query returns different amount of documents

Answers (1)

Related Questions