greenboxal
greenboxal

Reputation: 469

Searching by name on ElasticSearch

Say I have an index with thousands of names of clients and I need to be able to easily search them in a administration panel, like this:

John Anders John Smith Sarah Smith Bjarne Stroustrup

I want to have full search capabilities on it, which means that:

  1. If I search for John, I should get the John Anders and John Smith.

  2. If I search for Smith, I should get the Smith's couple.

  3. If I search for sarasmit or sara smit, I should get Sarah Smith as I searched for the initials of the name and the whitespace doesn't matter.

  4. If I search for johers or joh ers, I should get John Anders as I searched for strings contained in the name.

I already figured out that I could use an analyser with lowercase filter and a keyword tokenizer but they don't work for every case.

What is the right combination of tokenizers/analysers/queries to use?

Upvotes: 1

Views: 1169

Answers (1)

Nathan Smith
Nathan Smith

Reputation: 8347

Have a look at this, this is a question I asked regarding a similar data set. Here is a look at the index settings/mapping I have used to produce some decent results. Development has ceased on this for the interim however this is what I've produced so far. You can then develop your queries -

{
    "settings": {
         "number_of_shards": 5,
         "number_of_replicas": 0,
         "analysis": {
             "filter": {
                "synonym": {
                    "type": "synonym",
                    "synonyms_path": "synonyms/synonyms.txt"
                },
                "my_metaphone": {
                    "type": "phonetic",
                    "encoder": "metaphone",
                    "replace": false
                }
            },
             "analyzer": {
                "synonym": {
                    "tokenizer": "whitespace",
                    "filter": [
                         "lowercase",
                         "synonym"
                     ]
                 },
                 "metaphone": {
                     "tokenizer": "standard",
                     "filter": [
                         "my_metaphone"
                    ]
                },
                "porter": {
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase",
                        "porter_stem"
                    ]
                }
            }
        }
    },
    "mappings": {
        "mes": {
            "_all": {
                "enabled": false
            },
            "properties": {
                "pty_forename": {
                    "type": "multi_field",
                    "store": "yes",
                    "fields": {
                        "pty_forename": {
                            "type": "string",
                            "analyzer": "simple"
                        },
                        "metaphone": {
                            "type": "string",
                            "analyzer": "metaphone"
                        },
                        "porter": {
                            "type": "string",
                            "analyzer": "porter"
                        },
                        "synonym": {
                            "type": "string",
                            "analyzer": "synonym"
                        }
                     }
                },
                "pty_full_name": {
                    "type": "string",
                    "index": "not_analyzed",
                    "store": "yes"
                },
                "pty_surname": {
                    "type": "multi_field",
                    "store": "yes",
                    "fields": {
                        "pty_surname": {
                            "type": "string",
                            "analyzer": "simple"
                        },
                        "metaphone": {
                            "type": "string",
                            "analyzer": "metaphone"
                        },
                        "porter": {
                            "type": "string",
                            "analyzer": "porter"
                        },
                        "synonym": {
                            "type": "string",
                            "analyzer": "synonym"
                        }
                    }
                }
            }
        }
    }
}'

Note I have used the phonetic plugin and also I have a comprehensive list of synonyms which is critical for returning results for richard when dick is entered.

Upvotes: 1

Related Questions