Why is the reported number of hits from elasticsearch different depending on the query method?

Question

I have an elasticsearch index which has 60k elements. I know that by checking the head plugin and I get the same information via Sense (the result is in the lower right corner)

enter image description here

I then wanted to query the same index from Python, in two diffrent ways: via a direct requests call and using the elasticsearch module:

import elasticsearch
import json
import requests

# the requests version
data = {"query": {"match_all": {}}}
r = requests.get('http://elk.example.com:9200/nessus_current/_search', data=json.dumps(data))
print(len(r.json()['hits']['hits']))

# the elasticsearch module version
es = elasticsearch.Elasticsearch(hosts='elk.example.com')
res = es.search(index="nessus_current", body={"query": {"match_all": {}}})
print(len(res['hits']['hits']))

In both cases the result is 10 - far from the expected 60k. The results of the query make sense (the content is what I expect), it is just that there are only a few of them.

I took one of these 10 hits and queried with Sense for its _id to close the loop. It is, as expected, found indeed:

enter image description here

So it looks like the 10 hits are a subset of the whole index, why aren't all elements reported in the Python version of the calls?

Andrei Stefan · Accepted Answer

10 is the default size of the results returned by Elasticsearch. If you want more, specify "size": 100 for example. But, be careful, returning all the docs using size is not recommended as it can bring down your cluster. For getting back all the results use scan&scroll.

And I think it should be res['hits']['total'] not res['hits']['hits'] to get the number of total hits.

Why is the reported number of hits from elasticsearch different depending on the query method?

Answers (1)

Related Questions