Sterling Duchess
Sterling Duchess

Reputation: 2080

elasticsearch - decay documents using property value

My documents are made of categories. There are 40 different categories these are added to the document manually in database and indexed. This is how my document looks like:

{
  "name": "..",
  "categoryA": "..",
  "categoryB": "..",..
  "categoryDecayScore": 0.0 - 1.0
}

The documents are considered well covered if they are part of all 40 categories. So to push documents in all categories to the top I wanted to use the decay function to reduce the score of those who are part of less categories.

For this I use the categoryDecayScore property which is set at index time. If document is part of all 40 categories than it's categoryDecayScore will be 0.0 if it's missing half but has more than a 1/3 it will get a score of 0.2 and if it has less than 1/3 it will get a score of 0.3.

Then I also increase categoryDecayScore by 0.02 for less relavant scores.

What I want to do:
I would like documents who have categoryDecayScore > 0.0 to have their score decayed the farther they are from 0.0.

This is my filter function:

"filter": {
        "exp": {
          "categoryDecayScore" : {
            "origin" : 0.0,
            "scale" : 1.0,
            "offset" : 0.0,
            "decay" : 0.5
          }
        }
}

The way I understand documentation here:

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html

Is that origin is my point of reference and all documents who have categoryDecayScore > 0.0 will be decayed and any with categoryDecayScore >= 1.0 will be decayed by 0.5.

However looking at my results it seems this does not take affect. The top 4 documents all have the same score but here are the categoryDecayScore values:

{
  _score: 51.970146,
  categoryDecayScore: 0.04
},
{
  _score: 51.970146,
  categoryDecayScore: 0.2
},
{
  _score: 51.970146,
  categoryDecayScore: 0.02
},
{
  _score: 51.970146,
  categoryDecayScore: 0.3
}

Is this normal behaviour or am I understanding the decay function incorrectly. My assumption based on docs is:

Note 1:

Using explain flag I noticed with those exp settings the evaluated decay score is always 1. So the 51.. score is just the text matching score.

Upvotes: 7

Views: 756

Answers (2)

Sterling Duchess
Sterling Duchess

Reputation: 2080

My query is/was correct. The issue was that my range 0.0 - 1.0 was to small. So I decided on using whole integers instead of decimals and range from 0 to 1000. For the exclusion I ten set the origin to 100 instead of 0. This returned the expected result.

Upvotes: 2

Avish
Avish

Reputation: 4626

Your understanding of the decay function parameters is correct. However, in your post you put the decay function (exp) clause inside the filter clause, which is wrong -- filters are only used to remove documents from the recall set, but cannot affect their score.

To use a decay function, you need to include it inside a function_score query. In your case you need something like:

{
  "query": {
    "function_score": {
      "exp": {
        "categoryDecayScore": {
          "origin" : 0.0,
          "scale" : 1.0,
          "offset" : 0.0,
          "decay" : 0.5
        }
      }
    }
  }
}

If you only want this decay to affect documents having a categoryDecayScore > 0, you can add a filter to the decay function:

{
  "query": {
    "function_score": {
      "exp": {
        "filter": {
          "range": {
            "categoryDecayScore": { 
              "gt": 0.0 
            }
          }
        },
        "categoryDecayScore": {
          "origin" : 0.0,
          "scale" : 1.0,
          "offset" : 0.0,
          "decay" : 0.5
        }
      }
    }
  }
}

Also note that offset is 0 by default and decay is 0.5 by default, so you don't have to explicitly include those parameters.

The documentation for Decay Functions under the Function Score Query section has examples of the correct syntax and explanations about the defaults.

Upvotes: 0

Related Questions