Reputation: 421
I'm using the following query to search through a database of names, allowing fuzzy matching but giving preference to exact matches.
"query": {
"bool": {
"should": [
{
"match": {
"name": {
"query": "x",
"operator": "and",
"boost": 10
}
}
},
{
"match": {
"name": {
"query": "x",
"fuzziness": "AUTO",
"operator": "and"
}
}
},
{
"match": {
"altname": {
"query": "x",
"fuzziness": "AUTO",
"operator": "and"
}
}
}
]
}
}
The database contains entries with identical names. If that happens, I would like to boost those entries by a second field, let's call it weight
. However, I only want the boost to be applied between the subset of results with a (near) identical score, not to all of the results.
This is further complicated by the fact that results with an identical name may receive a slightly different score, as they are influenced by the relevancy on the altname
field.
For example, querying for dog
could give 3 results:
I'm looking for a query that would boost the result with id 2 to the top score. The result with id 3 should always stay at the bottom due to its poor relevancy, regardless of its weight. Ideally with tunable parameters to tweak the factor of the score vs. the factor of the weight.
Any way to do this in a single pass in Elasticsearch, of course without ruining performance?
Upvotes: 0
Views: 413
Reputation: 421
Looks like I figured it out.
First, I realised that the example in my original question was more complex than necessary. I narrowed it down to: "How to compose a query for 'blub' that returns the following documents in the order 2, 3, 1"
id: 1
name: blub
weight: 0.01
---
id: 2
name: blub
weight: 0.1
---
id: 3
name: blub stuff
weight: 1
Thus: for the two documents with an identical (or very similar) score, the weight should be used as a tie-breaker. But documents with a significantly lower score should never be allowed to trump other results, regardless of their weight.
I loaded the data in the excellent Play tool: https://www.found.no/play/gist/edd93c69c015d4c62366#search and started experimenting.
Turned out the log2p
modifier did exactly what I expected. Repeated it on a real-world dataset and everything looks exactly as expected.
function_score:
query:
match:
name: blub
field_value_factor:
field: weight
modifier: log2p
Upvotes: 0