Reputation: 3301
I have created an elasticsearch index of documents that have a title field and a text field. Given a query, the desired behavior for my search is that it first checks the title field and, if there are any documents where the title is a "good" match with the query, then those documents must be ranked top. Only after good title matches should documents be returned that have a good text match.
By "good title match" I mean something along the lines of "the query is close to some subset of the title, where close means levinshtein distance less than some given number". This is a threshold condition. So either a title is a "good" match, and should rank high, or it is not, and should receive no benefit for getting "some" match with the query. The outcome is binary.
So if there is a query "How to garden like the best", a document with the title "garden like the best" should be ranked first, followed by documents that have a good match for the query in their "text" field. A document with title "Budget Gardening" should receive no bonus for having "Gardening" in its title, because it is not a good enough match.
Here is my attempt. This is using the Python elastic_dsl library. But the JSON equivalent should be obvious.
s = Search()
initiated = s.query(
"multi_match",
query=query,
fields=[
'title^280',
'text^1'],
type='best_fields',
fuzziness='AUTO')
As you can see, I've done a multi match where I've given the "title" field a much higher importance. I've also allowed for some fuzziness for not knowing the exact spelling of the words in the title. The index is also stemmed. This approach has been mostly successful, but I've had two undesirable behaviors:
How can I adapt my query to obtain the desired behavior? Thank you.
Upvotes: 0
Views: 459
Reputation: 320
Haven't really tested it, but https://www.elastic.co/guide/en/elasticsearch/reference/master/query-dsl-function-score-query.html seems promising for your use case, you can try to implement the "threshold" with it.
Upvotes: 0