Reputation: 2080
My documents are made of categories. There are 40 different categories these are added to the document manually in database and indexed. This is how my document looks like:
{
"name": "..",
"categoryA": "..",
"categoryB": "..",..
"categoryDecayScore": 0.0 - 1.0
}
The documents are considered well covered if they are part of all 40 categories. So to push documents in all categories to the top I wanted to use the decay function to reduce the score of those who are part of less categories.
For this I use the categoryDecayScore
property which is set at index time. If document is part of all 40 categories than it's categoryDecayScore
will be 0.0
if it's missing half but has more than a 1/3 it will get a score of 0.2
and if it has less than 1/3 it will get a score of 0.3
.
Then I also increase categoryDecayScore
by 0.02 for less relavant scores.
What I want to do:
I would like documents who have categoryDecayScore > 0.0
to have their score decayed the farther they are from 0.0
.
This is my filter function:
"filter": {
"exp": {
"categoryDecayScore" : {
"origin" : 0.0,
"scale" : 1.0,
"offset" : 0.0,
"decay" : 0.5
}
}
}
The way I understand documentation here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html
Is that origin
is my point of reference and all documents who have categoryDecayScore > 0.0
will be decayed and any with categoryDecayScore >= 1.0
will be decayed by 0.5
.
However looking at my results it seems this does not take affect. The top 4 documents all have the same score but here are the categoryDecayScore
values:
{
_score: 51.970146,
categoryDecayScore: 0.04
},
{
_score: 51.970146,
categoryDecayScore: 0.2
},
{
_score: 51.970146,
categoryDecayScore: 0.02
},
{
_score: 51.970146,
categoryDecayScore: 0.3
}
Is this normal behaviour or am I understanding the decay function incorrectly. My assumption based on docs is:
Note 1:
Using explain flag I noticed with those exp settings the evaluated decay score is always 1. So the 51.. score is just the text matching score.
Upvotes: 7
Views: 756
Reputation: 2080
My query is/was correct. The issue was that my range 0.0 - 1.0 was to small. So I decided on using whole integers instead of decimals and range from 0 to 1000. For the exclusion I ten set the origin to 100 instead of 0. This returned the expected result.
Upvotes: 2
Reputation: 4626
Your understanding of the decay function parameters is correct. However, in your post you put the decay function (exp
) clause inside the filter
clause, which is wrong -- filters are only used to remove documents from the recall set, but cannot affect their score.
To use a decay function, you need to include it inside a function_score
query.
In your case you need something like:
{
"query": {
"function_score": {
"exp": {
"categoryDecayScore": {
"origin" : 0.0,
"scale" : 1.0,
"offset" : 0.0,
"decay" : 0.5
}
}
}
}
}
If you only want this decay to affect documents having a categoryDecayScore > 0, you can add a filter to the decay function:
{
"query": {
"function_score": {
"exp": {
"filter": {
"range": {
"categoryDecayScore": {
"gt": 0.0
}
}
},
"categoryDecayScore": {
"origin" : 0.0,
"scale" : 1.0,
"offset" : 0.0,
"decay" : 0.5
}
}
}
}
}
Also note that offset
is 0 by default and decay
is 0.5 by default, so you don't have to explicitly include those parameters.
The documentation for Decay Functions under the Function Score Query section has examples of the correct syntax and explanations about the defaults.
Upvotes: 0