Reputation: 5309
I'm encountering an issue with Elasticsearch where I'm using a function_score
query with a random_score
function. When I apply the random_score
to a match_all
query, it works as expected, assigning random scores to the returned documents. However, when I apply the same random_score
function within a bool query that filters documents based on the existence of a field, all documents return with a score of 0.0.
Here's the working script using match_all
:
curl -X POST "https://[my_elasticsearch_server]/_scripts/search_unclaimed_records" -H 'Content-Type: application/json' -d '
{
"script": {
"lang": "mustache",
"source": {
"size": 1,
"seq_no_primary_term": true,
"query": {
"function_score": {
"functions": [
{ "random_score": {} }
],
"query": {
"match_all": {}
}
}
}
}
}
}'
And here's the non-working script using a bool query to filter documents without a locked_until
field:
curl -X POST "https://[my_elasticsearch_server]/_scripts/search_unclaimed_records" -H 'Content-Type: application/json' -d '
{
"script": {
"lang": "mustache",
"source": {
"size": 1,
"seq_no_primary_term": true,
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{ "bool": { "must_not": { "exists": { "field": "locked_until" } } } }
],
"minimum_should_match": 1
}
},
"functions": [
{ "random_score": {} }
]
}
}
}
}
}'
The first script correctly assigns non-zero scores to documents, but the second script, despite successfully filtering documents, assigns them all a score of 0.0.
I'm puzzled as to why the random_score
function behaves differently when applied to documents filtered through a bool query. Both scripts are executed against the same Elasticsearch version and cluster (details below).
Does anyone have insights into why this behavior occurs, or suggestions on how to ensure random_score
assigns non-zero scores to filtered documents as well?
I am running opensearch
with the following versions:
"version": {
"number": "7.10.2",
"build_type": "tar",
"build_hash": "2c355ce1a427e4a528778d4054436b5c4b756221",
"build_date": "2024-02-20T02:18:49.874618333Z",
"build_snapshot": false,
"lucene_version": "9.9.2",
"minimum_wire_compatibility_version": "7.10.0",
"minimum_index_compatibility_version": "7.0.0"
},
"tagline": "The OpenSearch Project: https://opensearch.org/"
[EDIT]
Based on the comment from @Val - thanks! - I have tried the following, which seems to be working better. Is this what you had in mind Val?
curl -X POST "https://[my_elasticsearch_server]/_scripts/search_unclaimed_records" -H 'Content-Type: application/json' -d '
{
"script": {
"lang": "mustache",
"source": {
"size": 1,
"seq_no_primary_term": true,
"query": {
"bool": {
"filter": [
{ "bool": { "must_not": { "exists": { "field": "locked_until" } } } }
],
"should": [
{
"function_score": {
"functions": [
{ "random_score": {} }
]
}
}
]
}
}
}
}
}'
Upvotes: 1
Views: 252
Reputation: 5309
Okay - Val got me there with this link. Thanks @Val!
For completeness, I document the full answer in case it helps someone later on. I make it a community wiki, so feel free to edit away.
I was trying to create a way to searching unclaimed documents at random and while being able to "lock" a document for exclusive reading.
Solution Details
There are two scripts: one for searching unclaimed records (search_unclaimed_record
) and another to claim a document (claim_document
).
This script searches for an unclaimed document at random. A document is considered unclaimed if it doesn't have the locked_until
field or if the locked_until
timestamp is in the past.
curl -X POST "https://[anonymized_server]/_scripts/search_unclaimed_records" -H 'Content-Type: application/json' -d '
{
"script": {
"lang": "mustache",
"source": {
"size": 1,
"seq_no_primary_term": true,
"query": {
"bool": {
"filter": {
"bool": {
"should": [
{ "bool": { "must_not": { "exists": { "field": "locked_until" } } } },
{ "range": { "locked_until": { "lt": "now" } } }
],
"minimum_should_match": 1
}
},
"should": [
{
"function_score": {
"functions": [
{ "random_score": {"seed": "{{seed}}"} }
]
}
}
]
}
}
}
}
}'
Having the should
in a filter
block was the bit I had not understood.
This script locks the document for 5 minutes (or any specified duration), preventing it from being returned by the search_unclaimed_record
script within that time frame.
curl -X POST "https://[anonymized_server]/_scripts/claim_document" -H 'Content-Type: application/json' -d'
{
"script": {
"lang": "painless",
"source": "long lockDurationMs = params.lock_duration_ms != null ? params.lock_duration_ms : 5 * 60 * 1000; ctx._source.locked_until = (new Date().getTime() + lockDurationMs); ctx._source.owned_by = params.owned_by;"
}
}'
Usage First, find an unclaimed document:
curl -X POST "https://[anonymized_server]/_search/template?pretty" -H 'Content-Type: application/json' -d '{
"id": "search_unclaimed_records",
"params": { "seed": 1 }
}'
Then, use the obtained values to claim the document:
curl -X POST "https://[anonymized_server]/_update/[document_id]" -H 'Content-Type: application/json' -d '{"script": {"id": "claim_document", "params": {"owned_by": "your_user_identifier"}}, "if_seq_no": [seq_no], "if_primary_term": [primary_term]}'
The seq_no
and primary_term
ensuring that the document hasn't been updated by someone else in between running the two queries.
Upvotes: 1