Segmented
Segmented

Reputation: 2044

Elasticsearch: Sorted scroll in python inconsistent

I am a little confused with the results. I have a simple query to get the latest document added (based on sorted created date or timestamp):

query = {
            "query": {"match_all": {}},
            "sort": [
                {"created_date":  "desc"}
            ],
            "size": 1
        }

When I use helpers.scan() abstraction over Scroll() API. I get a hit which is different each time (inconsistent). My Elastic cluster is static (no new data points are being added) and the inconsistency in response is strange as I have sorted all entries and asked to return the the first hit (size 1) in my query. What am I missing here ?

Upvotes: 1

Views: 1213

Answers (1)

Segmented
Segmented

Reputation: 2044

For future references to people who stumble upon this. The documentation on the ElasticSearch homepage may not clarify doubts here but the python driver has a very good documentation. As per helpers.scan():

By default scan does not return results in any pre-determined order. To have a standard order in the returned documents (either by score or explicit sort definition) when scrolling, use preserve_order=True. This may be an expensive operation and will negate the performance benefits of using scan

So, for use cases like this, it is better to use search() than scan()

Upvotes: 4

Related Questions