Reputation: 3827
I'm using elasticsearch to search multiple array fields in my type, which looks something like
t1 = { field1: ["foo", "bar"],
field2: ["foo", "foo", "foo", "foo"]
field3: ["foo", "foo", "foo", "foo", "foo", "foo"]
}
And then I'm using a multi_match query to get matches, something along
multi_match: { query: "foo",
fields: "fields*"
}
When computing the score of t1, elasticsearch adds the score of queries in field1, field2 and field3 which is what I want. However, they are not contributing equally, field3 contributes to the score the most since "foo" occurs multiple times there.
I want now to compute the score within each array field by not adding up the score of all array entries, but by just taking the maximum of them. In my example, all fields contained would have the same score then since they all have one exact match.
This question was already asked on the elasticsearch forum, but has not been answered so far.
Upvotes: 4
Views: 1330
Reputation: 10473
I've been stumped on this myself, it really seems like there should be a simple, builtin way to just specify max instead of sum.
Not sure if this is exactly what you're going for, because you lose the match score on any particular item in the array. So you're not getting max of the match score of the best particular item, just a boolean value if anything matches. If it's something more nuanced (say a person's full name, where you want a better match for first and last vs just one or the other) this may not be acceptable because you're throwing out your scores.
If it is acceptable, this workaround seems to work:
{function_score: {
query: {bool: {should: [
{term: {field1: 'foo'}},
{term: {field2: 'foo'}},
{term: {field3: 'foo'}},
]}},
functions: [
{filter: {term: {field1: 'foo'}}, weight: 1},
{filter: {term: {field2: 'foo'}}, weight: 1},
{filter: {term: {field2: 'foo'}}, weight: 1},
],
score_mode: 'sum',
boost_mode: 'replace',
}}
We need the "query" part to give us the results to further filter, even though we discard the score. This seems like it should really be a filter, but just wrapping this same thing in the filtered
query doesn't work. There may be a better option here.
Then, the weight
functions just basically give a 1 if there's a match on that field and 0 otherwise. The score_mode
tells it to sum those weights, so in your case they all match so we get 3. The boost_mode
tells how to combine with the original query, "replace" tells it to ignore the original query score (which has the problem you mentioned that multiple matches in an array are being summed). So, the total score of this query is 3 because there are 3 matches.
It seems more complicated to me, but in my relatively limited testing I haven't noticed performance issues or anything. I'd love to see a better answer if someone more familiar with elasticsearch has one.
Upvotes: 2