Reputation: 469
Say I have an index with thousands of names of clients and I need to be able to easily search them in a administration panel, like this:
John Anders
John Smith
Sarah Smith
Bjarne Stroustrup
I want to have full search capabilities on it, which means that:
If I search for John
, I should get the John Anders
and John Smith
.
If I search for Smith
, I should get the Smith's couple.
If I search for sarasmit
or sara smit
, I should get Sarah Smith
as I searched for the initials of the name and the whitespace doesn't matter.
If I search for johers
or joh ers
, I should get John Anders
as I searched for strings contained in the name.
I already figured out that I could use an analyser with lowercase filter and a keyword tokenizer but they don't work for every case.
What is the right combination of tokenizers/analysers/queries to use?
Upvotes: 1
Views: 1169
Reputation: 8347
Have a look at this, this is a question I asked regarding a similar data set. Here is a look at the index settings/mapping I have used to produce some decent results. Development has ceased on this for the interim however this is what I've produced so far. You can then develop your queries -
{
"settings": {
"number_of_shards": 5,
"number_of_replicas": 0,
"analysis": {
"filter": {
"synonym": {
"type": "synonym",
"synonyms_path": "synonyms/synonyms.txt"
},
"my_metaphone": {
"type": "phonetic",
"encoder": "metaphone",
"replace": false
}
},
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": [
"lowercase",
"synonym"
]
},
"metaphone": {
"tokenizer": "standard",
"filter": [
"my_metaphone"
]
},
"porter": {
"tokenizer": "standard",
"filter": [
"lowercase",
"porter_stem"
]
}
}
}
},
"mappings": {
"mes": {
"_all": {
"enabled": false
},
"properties": {
"pty_forename": {
"type": "multi_field",
"store": "yes",
"fields": {
"pty_forename": {
"type": "string",
"analyzer": "simple"
},
"metaphone": {
"type": "string",
"analyzer": "metaphone"
},
"porter": {
"type": "string",
"analyzer": "porter"
},
"synonym": {
"type": "string",
"analyzer": "synonym"
}
}
},
"pty_full_name": {
"type": "string",
"index": "not_analyzed",
"store": "yes"
},
"pty_surname": {
"type": "multi_field",
"store": "yes",
"fields": {
"pty_surname": {
"type": "string",
"analyzer": "simple"
},
"metaphone": {
"type": "string",
"analyzer": "metaphone"
},
"porter": {
"type": "string",
"analyzer": "porter"
},
"synonym": {
"type": "string",
"analyzer": "synonym"
}
}
}
}
}
}
}'
Note I have used the phonetic plugin and also I have a comprehensive list of synonyms which is critical for returning results for richard
when dick
is entered.
Upvotes: 1