Scoring by term position in ElasticSearch?

Question

I'm implementing an auto-complete index in ElasticSearch and have run into an issue with sorting/scoring. Say I have the following strings in an index:

apple banana coconut donut
apple banana donut durian
apple donut coconut durian
donut banana coconut durian

When I search for "donut", I want the results to be ordered by the term location like so:

donut banana coconut durian
apple donut coconut durian
apple banana donut durian
apple banana coconut donut

I can't figure out how to make that happen. Term position isn't factored into the default scoring logic, and I can't find a way to get it in there. Seems like a simple enough issue though that others must have run into this before. Has anyone figured it out yet?

Thanks!

IGx89 · Accepted Answer

Here's the solution I ended up with, based on Andrei's answer and expanded to support multiple search terms and additional scoring based on length of the first word in the result:

First, define the following custom analyzer (it keeps the entire string as a single token and lowercases it):

"raw_analyzer": {
    "type": "custom",
    "filter": [
        "lowercase"
    ],
    "tokenizer": "keyword"
}

Second, define your search field mapping like so (mine's named "name"):

"name": {
    "type": "string",
    "analyzer": "english",
    "fields": {
        "raw": {
            "type": "string",
            "index_analyzer": "raw_analyzer",
            "search_analyzer": "standard"
        }
    }
},
"_nameFirstWordLength": {
    "type": "long"
}

Third, when populating the index use the following logic (mine's in C#) to populate:

_nameFirstWordLength = fi.Name.Split(new[] {' '}, StringSplitOptions.RemoveEmptyEntries)[0].Length

Finally, do your search as follows:

{
   "query":{
      "bool":{
         "must":{
            "match_phrase_prefix":{
               "name":{
                  "query":"apple"
               }
            }
         },
         "should":{
            "function_score":{
               "query":{
                  "query_string":{
                     "fields":[
                        "name.raw"
                     ],
                     "query":"apple*"
                  }
               },
               "script_score":{
                  "script":"100/doc['_nameFirstWordLength'].value"
               },
               "boost_mode":"replace"
            }
         }
      }
   }
}

I'm using match_phrase_prefix so that partial matches are supported, such as "ap" matching "apple". The bool must/should with that second query_string query against name.raw gives a higher score to results whose name starts with one of the search terms (in my code I'm pre-processing the search string, just for that second query, to add a "*" after every word). Finally, wrapping that second query in a function_score script that uses the value of _nameFirstWordLength causes the results up-scored by the second query to be further sorted by the length of their first word (causing Apple to show before Applebee's, for example).

Scoring by term position in ElasticSearch?

Answers (2)

Related Questions