ruba
ruba

Reputation: 120

How can I use scan/scroll with pagination and sort in ElasticSearch?

I have a ES DB storing history records from a process I run every day. Because I want to show only 20 records per page in the history (order by date), I was using pagination (size + from_) combined scroll, which worked just fine. But when I wanted to used sort in the query it didn't work. So I found that scroll with sort don't work. Looking for another alternative I tried the ES helper scan which works fine for scrolling and sorting the results, but with this solution pagination doesn't seem to work, which I don't understand why since the API says that scan sends all the parameters to the underlying search function. So my question is if there is any method to combine the three options.

Thanks,

Ruben

Upvotes: 2

Views: 4369

Answers (2)

scribu
scribu

Reputation: 3078

When using the elasticsearch.helpers.scan function, you need to pass preserve_order=True to enable sorting.

(Tested using elasticsearch==7.5.1)

Upvotes: 5

Carlos Rodriguez
Carlos Rodriguez

Reputation: 883

yes, you can combine scroll with sort, but, when you can sort string, you will need change the mapping for it works fine, Documentation Here

In order to sort on a string field, that field should contain one term only: the whole not_analyzed string. But of course we still need the field to be analyzed in order to be able to query it as full text.

The naive approach to indexing the same string in two ways would be to include two separate fields in the document: one that is analyzed for searching, and one that is not_analyzed for sorting.

"tweet": { 
    "type":     "string",
    "analyzer": "english",
    "fields": {
        "raw": { 
            "type":  "string",
            "index": "not_analyzed"
        }
    }
}
  • The main tweet field is just the same as before: an analyzed full-text field.
  • The new tweet.raw subfield is not_analyzed.

Now, or at least as soon as we have reindexed our data, we can use the tweet field for search and the tweet.raw field for sorting:

GET /_search
    {
        "query": {
            "match": {
                "tweet": "elasticsearch"
            }
        },
        "sort": "tweet.raw"
    }

Upvotes: 1

Related Questions