Reputation: 6594
I'm using Elasticsearch through the python requests
library. I've set up my analysers like so:
"analysis" : {
"analyzer": {
"my_basic_search": {
"type": "standard",
"stopwords": []
},
"my_autocomplete": {
"type": "custom",
"tokenizer": "keyword",
"filter": ["lowercase", "autocomplete"]
}
},
"filter": {
"autocomplete": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20,
}
}
}
I've got a list of artists who I'd like to search for using autocomplete: my current test case is 'bill w', which should match 'bill withers' etc - the artist
mapping looks like this (this is a output of GET http://localhost:9200/my_index/artist/_mapping
):
{
"my_index" : {
"mappings" : {
"artist" : {
"properties" : {
"clean_artist_name" : {
"type" : "string",
"analyzer" : "my_basic_search",
"fields" : {
"autocomplete" : {
"type" : "string",
"index_analyzer" : "my_autocomplete",
"search_analyzer" : "my_basic_search"
}
}
},
"submitted_date" : {
"type" : "date",
"format" : "basic_date_time"
},
"total_count" : {
"type" : "integer"
}
}
}
}
}
}
...and then I run this query to do the autocomplete:
"query": {
"function_score": {
"query": {
"bool": {
"must" : { "match": { "clean_artist_name.autocomplete": "bill w" } },
"should" : { "match": { "clean_artist_name": "bill w" } },
}
},
"functions": [
{
"script_score": {
"script": "artist-score"
}
}
]
}
}
This seems to match artists that contain either 'bill' or 'w' as well as 'bill withers': I only wanted to match artists that contain that exact string. The analyser seems to be working fine, here is the output of http://localhost:9200/my_index/_analyze?analyzer=my_autocomplete&text=bill%20w
:
{
"tokens" : [ {
"token" : "b",
"start_offset" : 0,
"end_offset" : 6,
"type" : "word",
"position" : 1
}, {
"token" : "bi",
"start_offset" : 0,
"end_offset" : 6,
"type" : "word",
"position" : 1
}, {
"token" : "bil",
"start_offset" : 0,
"end_offset" : 6,
"type" : "word",
"position" : 1
}, {
"token" : "bill",
"start_offset" : 0,
"end_offset" : 6,
"type" : "word",
"position" : 1
}, {
"token" : "bill ",
"start_offset" : 0,
"end_offset" : 6,
"type" : "word",
"position" : 1
}, {
"token" : "bill w",
"start_offset" : 0,
"end_offset" : 6,
"type" : "word",
"position" : 1
} ]
}
So why is this not excluding matches with just 'bill' or 'w' in there? Is there something in my query that is allowing the results that only match with the my_basic_search
analyser?
Upvotes: 0
Views: 1396
Reputation: 52368
I believe you need a "term" filter instead of a "match" one for your "must". You already have split your artist names in ngrams so your searching text should match exactly one of the ngrams. For this to happen you need a "term" that will match exactly the ngrams:
"query": {
"function_score": {
"query": {
"bool": {
"must" : { "term": { "clean_artist_name.autocomplete": "bill w" } },
"should" : { "match": { "clean_artist_name": "bill w" } },
}
},
"functions": [
{
"script_score": {
"script": "artist-score"
}
}
]
}
}
Upvotes: 1