Reputation: 90
I want to write query in ElasticSearch that provides results which contains all the words in search query but not only as complete word , but also as subword. For example, if I have Document with this values:
{
"first_name":"didier",
"last_name":"drogba"
}
and i search for "didi dro", this document should be returned. if i search for "david drogba", document should be ignored because it doesn't contain word "david" even as subword. I tried it using ngram tokenizer but couldn't achive what i want.
Index i created
PUT doctors
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "ngram"
}
}
}
}
}
then added mapping
put doctors/_doc/_mapping
{
"properties": {
"first_name": {
"type": "text",
"analyzer": "my_analyzer"
},
"last_name": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
add some document
post doctors/_doc/1
{
"first_name": "dito",
"last_name": "janelidze",
"specialism": "oftalmologist",
"location_name":"evex saburtalo clinic",
"brand": "Evex",
"address":"kavtaradze street N21"
}
and my search query looks like this
get doctors/_doc/_search
{
"query": {
"multi_match": {
"query": "david jane",
"fields": ["first_name", "last_name"]
}
}
}
it gives me document which i inserted but i dont need it because it doesn't contain word "david"
Upvotes: 0
Views: 81
Reputation: 8860
N-Gram tokenizer would construct words of specified length from input words. This length is specified as the min_gram
and max_gram
in the mapping which if you don't specify would defaults to 1
and 2
respectively.
I've update the mapping you've provided with min_gram:3
and max_gram:5
respectively.
The N-Gram Tokenizer would then create the tokens, say for e.g. for didier
they would be did, idi, die, ier, didi, idie, dier, didie, idier
, which are eventually stored in inverted index.
With default 1 and 2 as min_gram
and max_gram
respectively, notice that didier
and david
would have id
as common sub word, that's why they are returned.
Mapping
PUT doctors
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "ngram",
"min_gram": 3,
"max_gram": 6,
}
}
}
}
}
That said, despite making the mapping change, if your query string has david jane
using what you have, it would search for david or jane
in first_name
OR last_name
. That means that document dito janelidze
would still be returned (but will have lower score than one having david jane
)
Using operator AND
would make the search as david AND jane
in first_name
OR in last_name
which isn't what you are looking for.
Instead what you can do is make use of below bool query or create another field called name
, copy the values of first_name
and last_name
to it using copy_to field and use that field to search.
Query
POST <your_index_name>/_search
{
"query": {
"bool":{
"must": [
{
"match": {
"first_name": "david"
}
},
{
"match": {
"last_name": "jane"
}
}
]
}
}
}
Unfortunately you would need to delete, recreate the index and ingest the documents again as the changes required are at mapping level.
Hope this helps!
Upvotes: 1
Reputation: 1770
+1 for the operator "and" for each words.Use this, work for me (could be use for autocomplete too).
settings:
analysis": {
"filter": {
"name_ngrams": {
"max_gram": "20",
"type": "edgeNGram",
"min_gram": "1",
"side": "front"
}
},
"analyzer": {
"partial_name": {
"type": "custom",
"filter": [
"lowercase",
"name_ngrams",
"standard",
"asciifolding"
],
"tokenizer": "standard"
},
"full_name": {
"type": "custom",
"filter": [
"standard",
"lowercase",
"asciifolding"
],
"tokenizer": "standard"
}
}
mapping:
"first_name": {
"type": "text",
"index_analyzer": "partial_name",
"search_analyzer": "full_name"
},
"last_name": {
"type": "text",
"index_analyzer": "partial_name",
"search_analyzer": "full_name"
},
Upvotes: 1