Reputation: 96
Thanks in advance. I expose the situation first and in the end the solution.
I have a collection of 2M documents with the following mapping:
{
"image": {
"properties": {
"timestamp": {
"type": "date",
"format": "dateOptionalTime"
},
"title": {
"type": "string"
},
"url": {
"type": "string"
}
}
}
}
I have a webpage which paginates through all the documents with the following search:
{
"from":STARTING_POSITION_NUMBER,
"size":15,
"sort" : [
{ "_id" : {"order" : "desc"}}
],
"query" : {
"match_all": {}
}
}
And a hit looks like this(note that the _id value is a hash of the url to prevent duplicated documents):
{
"_index": "images",
"_type": "image",
"_id": "2a750a4817bd1600",
"_score": null,
"_source": {
"url": "http://test.test/test.jpg",
"timestamp": "2014-02-13T17:01:40.442307",
"title": "Test image!"
},
"sort": [
null
]
}
This works pretty well. The only problem I have is that the documents appear sorted chronologically (The oldest documents appear on the first page, and the ones indexed more recently on the last page), but I want them to appear on a random order. For example, page 10 should always show always the same N documents, but they don't have to appear sorted by the date.
I though of something like sorting all the documents by their hash, which is kind of random and deterministic. How could I do it?
I've searched on the docs and the sorting api just works for sorting the results, not the full index. If I don't find a solution I will pick documents randomly and index them on a separated collection.
Thank you.
Upvotes: 1
Views: 3007
Reputation: 96
I solved it using the following search:
{
"from":STARTING_POSITION_NUMBER,
"size":15,
"query" : {
"function_score": {
"random_score": {
"seed" : 1
}
}
}
}
Thanks to David from the Elasticsearch mailing list for pointing out the function score with random scoring.
Upvotes: 4