Reputation: 2044
I am a little confused with the results. I have a simple query to get the latest document added (based on sorted created date or timestamp):
query = {
"query": {"match_all": {}},
"sort": [
{"created_date": "desc"}
],
"size": 1
}
When I use helpers.scan()
abstraction over Scroll()
API. I get a hit which is different each time (inconsistent). My Elastic cluster is static (no new data points are being added) and the inconsistency in response is strange as I have sorted all entries and asked to return the the first hit (size 1) in my query. What am I missing here ?
Upvotes: 1
Views: 1213
Reputation: 2044
For future references to people who stumble upon this. The documentation on the ElasticSearch homepage may not clarify doubts here but the python driver has a very good documentation. As per helpers.scan()
:
By default scan does not return results in any pre-determined order. To have a standard order in the returned documents (either by score or explicit sort definition) when scrolling, use preserve_order=True. This may be an expensive operation and will negate the performance benefits of using scan
So, for use cases like this, it is better to use search()
than scan()
Upvotes: 4