Neil
Neil

Reputation: 3301

Elasticsearch prioritise one field in query

I have created an elasticsearch index of documents that have a title field and a text field. Given a query, the desired behavior for my search is that it first checks the title field and, if there are any documents where the title is a "good" match with the query, then those documents must be ranked top. Only after good title matches should documents be returned that have a good text match.

By "good title match" I mean something along the lines of "the query is close to some subset of the title, where close means levinshtein distance less than some given number". This is a threshold condition. So either a title is a "good" match, and should rank high, or it is not, and should receive no benefit for getting "some" match with the query. The outcome is binary.

So if there is a query "How to garden like the best", a document with the title "garden like the best" should be ranked first, followed by documents that have a good match for the query in their "text" field. A document with title "Budget Gardening" should receive no bonus for having "Gardening" in its title, because it is not a good enough match.

Here is my attempt. This is using the Python elastic_dsl library. But the JSON equivalent should be obvious.

s = Search()

initiated = s.query(
    "multi_match",
    query=query,
    fields=[
        'title^280',
        'text^1'],
    type='best_fields',
    fuzziness='AUTO')

As you can see, I've done a multi match where I've given the "title" field a much higher importance. I've also allowed for some fuzziness for not knowing the exact spelling of the words in the title. The index is also stemmed. This approach has been mostly successful, but I've had two undesirable behaviors:

  1. Documents that have a title that has anything in common with the query appear very high. For example, a the above query will match a document with the title "budget gardening" higher than a document with a much better text field match. This is because there is no threshold.
  2. Documents that have a very good body match still appear higher than documents where the title is literally the exact query string.

How can I adapt my query to obtain the desired behavior? Thank you.

Upvotes: 0

Views: 459

Answers (1)

Dian Bakti
Dian Bakti

Reputation: 320

Haven't really tested it, but https://www.elastic.co/guide/en/elasticsearch/reference/master/query-dsl-function-score-query.html seems promising for your use case, you can try to implement the "threshold" with it.

Upvotes: 0

Related Questions