Reputation: 2265
I am using Elasticsearch 6.1 API for Python and I am trying to read a certain value from every single document in the database (303 958 documents).
doc = {
'size' : 1000,
'query' : {
'match_all' : {}
}
}
samplesCount = 0
res = es.search(index="index", doc_type='data', body=doc, scroll='1m')
scrollId = res['_scroll_id']
scrollSize = res['hits']['total']
while scrollSize > 0 :
for x in range (0, len(res['hits']['hits']) - 1) :
name = res['hits']['hits'][x]['_source']['name']
samplesCount += 1
print(str(samplesCount) + '. ' + name)
scrollSize -= 1
res = es.scroll(scroll_id=scrollId, scroll='1m')
The indexing (samplesCount) ends at 303 654 and it seems like the es.scroll returns no results for the remaining documents (around 300, which is less then a scroll size).
What is also makes me curious is that it ends at 303 654 ... I would expect a round number (a multiple of 1000).
Any ideas ?
Thank you very much for any helpful tips.
Upvotes: 1
Views: 1308
Reputation: 5924
Try replacing
range (0, len(res['hits']['hits']) - 1)
with
range(0, len(res['hits']['hits']))
or (equivalently)
range(len(res['hits']['hits']))
From looking at the syntax and the numbers that you quote it looks like you are skipping 1 record per iteration of the while
cycle.
Upvotes: 1