Miguel
Miguel

Reputation: 586

Order the result by best match on other field

Nowdays I am learning Elasticsearch and I have a lot of questions.

The most immediate question that arises is how to sort by the best match of a field (unfiltered).

I have the following set of data:

{
    "id": 1,
    "name": "John Smith",
    "categories": ["1", "2"]
},
{
    "id": 2,
    "name": "John Smith",
    "categories": ["2", "3"]
},
{
    "id": 3,
    "name": "John Doe",
    "categories": ["2", "4"]
}

I want to search by name and in case the result will be the same I would like to order by best match on categories.

My current query only filters by name:

{
    "query": {
        "bool": {
            "must": {
                "bool": {
                    "should": [
                        {
                            "query_string": {
                                "query": "*John Smith*",
                                "fields": ["name"],
                                "default_operator": "and",
                                "boost": 10
                            }
                        },
                        {
                            "match": {
                                "name": {
                                    "query": "John Smith",
                                    "fuzziness": "AUTO",
                                    "operator": "and"
                                }
                            }
                        }
                    ]
                }
            }
        }
    }
}

In this case the result will be two hits ("id": 1 & "id": 2) but I would like order by categories. For example, if I also asked for "categories": ["3", "4"], the first result would be the record with "id": 2 because this record has a category (3) that matches.

How can I modify my query to achieve this requirement?

Upvotes: 2

Views: 1993

Answers (1)

Nikolay Vasiliev
Nikolay Vasiliev

Reputation: 6066

You're almost there, although I must say this question is more about search result relevance than ordering (sorting).

To achieve your goal you may add a should clause next to your must part of the bool query:

{
    "query": {
        "bool": {
            "must": {
                "bool": {
                    "should": [
                        {
                            "query_string": {
                                "query": "*John Smith*",
                                "fields": ["name"],
                                "default_operator": "and",
                                "boost": 10
                            }
                        },
                        {
                            "match": {
                                "name": {
                                    "query": "John Smith",
                                    "fuzziness": "AUTO",
                                    "operator": "and"
                                }
                            }
                        }
                    ]
                }
            },
            "should": [
              {
                "terms": {
                  "categories": [
                    "3",
                    "4"
                  ]
                }
              }
            ]
        }
    }
}

This happens because should in this case only affects the score, that means brings the result that matched extra conditions higher:

If the bool query is in a query context and has a must or filter clause then a document will match the bool query even if none of the should queries match. In this case these clauses are only used to influence the score.

You can find some more information about relevance score here.

Hope that helps!

Upvotes: 2

Related Questions