Reputation: 73
I'm working with a huge (5 million documents) ElasticSearch database and I need to fetch data using sliced scroll in python. Question is: if there is some way to limit (set size
param) the sliced scroll? I tried to set size
param by [search obj].param(size=500000)
or [:500000]
but it doesn't seem to work - sliced scroll gives me all documents.
In my script, I'm using sliced scroll with python multiprocessing like in here: https://github.com/elastic/elasticsearch-dsl-py/issues/817
Is there some way to get for example 500000 documents using sliced scroll?
Thanks in advance.
Upvotes: 0
Views: 1849
Reputation: 73
Answer from github:
"There is no limit on scroll, it always returns all documents. To only get a subset simply stop consuming the iterator after you get the number you wanted to retrieve by using a break statement or similar."
https://github.com/elastic/elasticsearch-dsl-py/issues/817
Upvotes: 1