Yeggeps
Yeggeps

Reputation: 2105

Random order & pagination Elasticsearch

In this issue is a feature request for ordering with optional seed allowing for recreation of random order.

I need to be able to paginate random ordered results. How could this be be done with Elasticsearch 0.19.1 ?

Thanks.

Upvotes: 43

Views: 40689

Answers (6)

Şafak Saylam
Şafak Saylam

Reputation: 31

New format:

{
    "sort": {
        "_script": {
            "type": "number",
            "script": {
                "source": "Math.random()",
                "lang": "painless"
            },
            "order": "asc"
        }
    }
}

Upvotes: 3

imotov
imotov

Reputation: 30163

You can sort using a hash function of a unique field (for example id) and a random salt. Depending on how truly random the results should be, you can do something as primitive as:

{
  "query" : { "query_string" : {"query" : "*:*"} },
  "sort" : {
    "_script" : { 
        "script" : "(doc['_id'].value + salt).hashCode()",
        "type" : "number",
        "params" : {
            "salt" : "some_random_string"
        },
        "order" : "asc"
    }
  }
}

or something as sophisticated as

{
  "query" : { "query_string" : {"query" : "*:*"} },
  "sort" : {
    "_script" : { 
        "script" : "org.elasticsearch.common.Digest.md5Hex(doc['_id'].value + salt)",
        "type" : "string",
        "params" : {
            "salt" : "some_random_string"
        },
        "order" : "asc"
    }
  }
}

The second example will produce more random results but will be somewhat slower.

For this approach to work the field _id has to be stored. Otherwise, the query will fail with NullPointerException.

Upvotes: 47

Andy
Andy

Reputation: 735

Well, i was looking at doing this and all the approaches above seemed a little "too complicated" for something that should be relatively simple. So i came up with an alternative that works perfectly well without the need of "going mental"

I perform a _count query first then combine it with "Start" and rand(0,$count)

e.g.

JSONArray = array of json to send to ElasticSearch
$total_results = $ElasticSearchClient->count(JSONArray)
$start = rand(0, $total_results)
JSONArray['body']['from'] = $start;
$ElasticSearchClient->search(JSONArray);

Assumptions for the above example:

  • You're running PHP
  • You're also using the the PHP Client

But you dont NEED to do this with PHP, the approach would work with any example.

Upvotes: 0

Nariman
Nariman

Reputation: 6426

This should be considerably faster than both answers above and supports seeding:

curl -XGET 'localhost:9200/_search' -d '{
  "query": {
    "function_score" : {
      "query" : { "match_all": {} },
      "random_score" : {}
    }
  }
}';

See: https://github.com/elasticsearch/elasticsearch/issues/1170

Upvotes: 79

DavidGOrtega
DavidGOrtega

Reputation: 287

Good solution from imotov.

Here is something much more simple and you don't need to rely in a document property:

{
  "query" : { "query_string" : {"query" : "*:*"} },
  "sort" : {
    "_script" : { 
        "script" : "Math.random()",
        "type" : "number",
        "params" : {},
        "order" : "asc"
    }
  }
}

if you want to set a range that would be something like:

{
  "query" : { "query_string" : {"query" : "*:*"} },
  "sort" : {
    "_script" : { 
        "script" : "Math.random() * (myMax - myMin) + myMin",
        "type" : "number",
        "params" : {},
        "order" : "asc"
    }
  }
}

replacing the max and min with your proper values.

Upvotes: 25

Yeggeps
Yeggeps

Reputation: 2105

I ended up solving it slightly different than what imotov suggested. As I have multiple clients I didn't want to implement the logic surrounding the salt string on every one of them.

I already had a randomized_key on the model. I also didn't need the order to be random for every request so I made a scheduled job to update the randomized key every night and then sorted by that field in Elasticssearch.

Upvotes: 4

Related Questions