benwad
benwad

Reputation: 6594

Elasticsearch full-text autocomplete

I'm using Elasticsearch through the python requests library. I've set up my analysers like so:

"analysis" : {
        "analyzer": {
            "my_basic_search": {
                "type": "standard",
                "stopwords": []
            },
            "my_autocomplete": {
                "type": "custom",
                "tokenizer": "keyword",
                "filter": ["lowercase", "autocomplete"]
            }
        },
        "filter": {
            "autocomplete": {
                "type": "edge_ngram",
                "min_gram": 1,
                "max_gram": 20,
            }
        }
    }

I've got a list of artists who I'd like to search for using autocomplete: my current test case is 'bill w', which should match 'bill withers' etc - the artist mapping looks like this (this is a output of GET http://localhost:9200/my_index/artist/_mapping):

{
  "my_index" : {
    "mappings" : {
      "artist" : {
        "properties" : {
          "clean_artist_name" : {
            "type" : "string",
            "analyzer" : "my_basic_search",
            "fields" : {
              "autocomplete" : {
                "type" : "string",
                "index_analyzer" : "my_autocomplete",
                "search_analyzer" : "my_basic_search"
              }
            }
          },
          "submitted_date" : {
            "type" : "date",
            "format" : "basic_date_time"
          },
          "total_count" : {
            "type" : "integer"
          }
        }
      }
    }
  }
}

...and then I run this query to do the autocomplete:

"query": {
        "function_score": {
            "query": {
                "bool": {
                    "must" : { "match": { "clean_artist_name.autocomplete": "bill w" } },
                    "should" : { "match": { "clean_artist_name": "bill w" } },
                }
            },
            "functions": [
            {
                "script_score": {
                    "script": "artist-score"
                }
            }
            ]
        }
    }

This seems to match artists that contain either 'bill' or 'w' as well as 'bill withers': I only wanted to match artists that contain that exact string. The analyser seems to be working fine, here is the output of http://localhost:9200/my_index/_analyze?analyzer=my_autocomplete&text=bill%20w:

{
  "tokens" : [ {
    "token" : "b",
    "start_offset" : 0,
    "end_offset" : 6,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "bi",
    "start_offset" : 0,
    "end_offset" : 6,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "bil",
    "start_offset" : 0,
    "end_offset" : 6,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "bill",
    "start_offset" : 0,
    "end_offset" : 6,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "bill ",
    "start_offset" : 0,
    "end_offset" : 6,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "bill w",
    "start_offset" : 0,
    "end_offset" : 6,
    "type" : "word",
    "position" : 1
  } ]
}

So why is this not excluding matches with just 'bill' or 'w' in there? Is there something in my query that is allowing the results that only match with the my_basic_search analyser?

Upvotes: 0

Views: 1396

Answers (1)

Andrei Stefan
Andrei Stefan

Reputation: 52368

I believe you need a "term" filter instead of a "match" one for your "must". You already have split your artist names in ngrams so your searching text should match exactly one of the ngrams. For this to happen you need a "term" that will match exactly the ngrams:

"query": {
    "function_score": {
        "query": {
            "bool": {
                "must" : { "term": { "clean_artist_name.autocomplete": "bill w" } },
                "should" : { "match": { "clean_artist_name": "bill w" } },
            }
        },
        "functions": [
        {
            "script_score": {
                "script": "artist-score"
            }
        }
        ]
    }
}

Upvotes: 1

Related Questions