UD1989
UD1989

Reputation: 317

elasticsearch: How to rank first appearing words or phrases higher

For example, if I have the following documents:

1. Casa Road
2. Jalan Casa 

Say my query term is "cas"... on searching, both documents have same scores. I want the one with casa appearing earlier (i.e. document 1 here) and to rank first in my query output.

I am using an edgeNGram Analyzer. Also I am using aggregations so I cannot use the normal sorting that happens after querying.

Upvotes: 1

Views: 1703

Answers (2)

Jrgns
Jrgns

Reputation: 25178

You can use the Bool Query to boost the items that start with the search query:

{
    "bool" : {
        "must" : {
            "match" : { "name" : "cas" }
        },
        "should": {
            "prefix" : { "name" : "cas" }
        },
    }
}

I'm assuming the values you gave is in the name field, and that that field is not analyzed. If it is analyzed, maybe look at this answer for more ideas.

The way it works is:

  1. Both documents will match the query in the must clause, and will receive the same score for that. A document won't be included if it doesn't match the must query.
  2. Only the document with the term starting with cas will match the query in the should clause, causing it to receive a higher score. A document won't be excluded if it doesn't match the should query.

Upvotes: 3

Andrei Stefan
Andrei Stefan

Reputation: 52368

This might be a bit more involved, but it should work. Basically, you need the position of the term within the text itself and, also, the number of terms from the text. The actual scoring is computed using scripts, so you need to enable dynamic scripting in elasticsearch.yml config file:

script.engine.groovy.inline.search: on

This is what you need:

  • a mapping that is using term_vector set to with_positions, and edgeNGram and a sub-field of type token_count:
PUT /test
{
  "mappings": {
    "test": {
      "properties": {
        "text": {
          "type": "string",
          "term_vector": "with_positions",
          "index_analyzer": "edgengram_analyzer",
          "search_analyzer": "keyword",
          "fields": {
            "word_count": {
              "type": "token_count",
              "store": "yes",
              "analyzer": "standard"
            }
          }
        }
      }
    }
  },
  "settings": {
    "analysis": {
      "filter": {
        "name_ngrams": {
          "min_gram": "2",
          "type": "edgeNGram",
          "max_gram": "30"
        }
      },
      "analyzer": {
        "edgengram_analyzer": {
          "type": "custom",
          "filter": [
            "standard",
            "lowercase",
            "name_ngrams"
          ],
          "tokenizer": "standard"
        }
      }
    }
  }
}
  • test documents:
POST /test/test/1
{"text":"Casa Road"}
POST /test/test/2
{"text":"Jalan Casa"}
  • the query itself:
GET /test/test/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "function_score": {
            "query": {
              "term": {
                "text": {
                  "value": "cas"
                }
              }
            },
            "script_score": {
              "script": "termInfo=_index['text'].get('cas',_POSITIONS);wordCount=doc['text.word_count'].value;if (termInfo) {for(pos in termInfo){return (wordCount-pos.position)/wordCount}};"
            },
            "boost_mode": "sum"
          }
        }
      ]
    }
  }
}
  • and the results:
   "hits": {
      "total": 2,
      "max_score": 1.3715843,
      "hits": [
         {
            "_index": "test",
            "_type": "test",
            "_id": "1",
            "_score": 1.3715843,
            "_source": {
               "text": "Casa Road"
            }
         },
         {
            "_index": "test",
            "_type": "test",
            "_id": "2",
            "_score": 0.8715843,
            "_source": {
               "text": "Jalan Casa"
            }
         }
      ]
   }

Upvotes: 1

Related Questions