Reputation: 29967
I have an elasticsearch index which has 60k elements. I know that by checking the head
plugin and I get the same information via Sense (the result is in the lower right corner)
I then wanted to query the same index from Python, in two diffrent ways: via a direct requests
call and using the elasticsearch
module:
import elasticsearch
import json
import requests
# the requests version
data = {"query": {"match_all": {}}}
r = requests.get('http://elk.example.com:9200/nessus_current/_search', data=json.dumps(data))
print(len(r.json()['hits']['hits']))
# the elasticsearch module version
es = elasticsearch.Elasticsearch(hosts='elk.example.com')
res = es.search(index="nessus_current", body={"query": {"match_all": {}}})
print(len(res['hits']['hits']))
In both cases the result is 10
- far from the expected 60k. The results of the query make sense (the content is what I expect), it is just that there are only a few of them.
I took one of these 10 hits and queried with Sense for its _id
to close the loop. It is, as expected, found indeed:
So it looks like the 10 hits are a subset of the whole index, why aren't all elements reported in the Python version of the calls?
Upvotes: 1
Views: 1911
Reputation: 52368
10 is the default size of the results returned by Elasticsearch. If you want more, specify "size": 100
for example. But, be careful, returning all the docs using size is not recommended as it can bring down your cluster. For getting back all the results use scan&scroll.
And I think it should be res['hits']['total']
not res['hits']['hits']
to get the number of total hits.
Upvotes: 1