Tom
Tom

Reputation: 5309

Elasticsearch Random Score Returns 0 for Filtered Query but Works with Match All

I'm encountering an issue with Elasticsearch where I'm using a function_score query with a random_score function. When I apply the random_score to a match_all query, it works as expected, assigning random scores to the returned documents. However, when I apply the same random_score function within a bool query that filters documents based on the existence of a field, all documents return with a score of 0.0.

Here's the working script using match_all:

curl -X POST "https://[my_elasticsearch_server]/_scripts/search_unclaimed_records" -H 'Content-Type: application/json' -d '
{
  "script": {
    "lang": "mustache",
    "source": {
      "size": 1,
      "seq_no_primary_term": true,
      "query": {
        "function_score": {
          "functions": [
            { "random_score": {} }
          ],
          "query": {
            "match_all": {}
          }
        }
      }
    }
  }
}'

And here's the non-working script using a bool query to filter documents without a locked_until field:

curl -X POST "https://[my_elasticsearch_server]/_scripts/search_unclaimed_records" -H 'Content-Type: application/json' -d '
{
  "script": {
    "lang": "mustache",
    "source": {
      "size": 1,
      "seq_no_primary_term": true,
      "query": {
        "function_score": {
          "query": {
            "bool": {
              "should": [
                { "bool": { "must_not": { "exists": { "field": "locked_until" } } } }
              ],
              "minimum_should_match": 1
            }
          },
          "functions": [
            { "random_score": {} }
          ]
        }
      }
    }
  }
}' 

The first script correctly assigns non-zero scores to documents, but the second script, despite successfully filtering documents, assigns them all a score of 0.0.

I'm puzzled as to why the random_score function behaves differently when applied to documents filtered through a bool query. Both scripts are executed against the same Elasticsearch version and cluster (details below).

Does anyone have insights into why this behavior occurs, or suggestions on how to ensure random_score assigns non-zero scores to filtered documents as well?


I am running opensearch with the following versions:

"version": {
        "number": "7.10.2",
        "build_type": "tar",
        "build_hash": "2c355ce1a427e4a528778d4054436b5c4b756221",
        "build_date": "2024-02-20T02:18:49.874618333Z",
        "build_snapshot": false,
        "lucene_version": "9.9.2",
        "minimum_wire_compatibility_version": "7.10.0",
        "minimum_index_compatibility_version": "7.0.0"
    },
    "tagline": "The OpenSearch Project: https://opensearch.org/"

[EDIT]

Based on the comment from @Val - thanks! - I have tried the following, which seems to be working better. Is this what you had in mind Val?

curl -X POST "https://[my_elasticsearch_server]/_scripts/search_unclaimed_records" -H 'Content-Type: application/json' -d '
{
  "script": {
    "lang": "mustache",
    "source": {
      "size": 1,
      "seq_no_primary_term": true,
      "query": {
        "bool": {
          "filter": [
            { "bool": { "must_not": { "exists": { "field": "locked_until" } } } }
          ],
          "should": [
            {
              "function_score": {
                "functions": [
                  { "random_score": {} }
                ]
              }
            }
          ]
        }
      }
    }
  }
}'

Upvotes: 1

Views: 252

Answers (1)

Tom
Tom

Reputation: 5309

Okay - Val got me there with this link. Thanks @Val!

For completeness, I document the full answer in case it helps someone later on. I make it a community wiki, so feel free to edit away.

I was trying to create a way to searching unclaimed documents at random and while being able to "lock" a document for exclusive reading.

Solution Details

There are two scripts: one for searching unclaimed records (search_unclaimed_record) and another to claim a document (claim_document).

  1. Search Unclaimed Record Script

This script searches for an unclaimed document at random. A document is considered unclaimed if it doesn't have the locked_until field or if the locked_until timestamp is in the past.

curl -X POST "https://[anonymized_server]/_scripts/search_unclaimed_records" -H 'Content-Type: application/json' -d '
{
  "script": {
    "lang": "mustache",
    "source": {
      "size": 1,
      "seq_no_primary_term": true,
      "query": {
        "bool": {
          "filter": {
            "bool": {
              "should": [
                { "bool": { "must_not": { "exists": { "field": "locked_until" } } } },
                { "range": { "locked_until": { "lt": "now" } } }
              ],
              "minimum_should_match": 1
            }
          },
          "should": [
            {
              "function_score": {
                "functions": [
                  { "random_score": {"seed": "{{seed}}"} }
                ]
              }
            }
          ]
        }
      }
    }
  }
}'

Having the should in a filter block was the bit I had not understood.

  1. Claim Document Script

This script locks the document for 5 minutes (or any specified duration), preventing it from being returned by the search_unclaimed_record script within that time frame.

curl -X POST "https://[anonymized_server]/_scripts/claim_document" -H 'Content-Type: application/json' -d'
{
  "script": {
    "lang": "painless",
    "source": "long lockDurationMs = params.lock_duration_ms != null ? params.lock_duration_ms : 5 * 60 * 1000; ctx._source.locked_until = (new Date().getTime() + lockDurationMs); ctx._source.owned_by = params.owned_by;"
  }
}'

Usage First, find an unclaimed document:

curl -X POST "https://[anonymized_server]/_search/template?pretty" -H 'Content-Type: application/json' -d '{
  "id": "search_unclaimed_records",
  "params": { "seed": 1 }
}'

Then, use the obtained values to claim the document:

curl -X POST "https://[anonymized_server]/_update/[document_id]" -H 'Content-Type: application/json' -d '{"script": {"id": "claim_document", "params": {"owned_by": "your_user_identifier"}}, "if_seq_no": [seq_no], "if_primary_term": [primary_term]}'

The seq_no and primary_term ensuring that the document hasn't been updated by someone else in between running the two queries.

Upvotes: 1

Related Questions