henry blake
henry blake

Reputation: 63

To match words with the same pronounciation elasticsearch

I would like to match words that spells different, but have the same pronounciation. Like "mail" and "male", "plane" and "plain". Can we do such a matching in elasticsearch?

Upvotes: 3

Views: 168

Answers (3)

Val
Val

Reputation: 217424

You can use the analysis phonetic plugin for that task.

Let's create an index with a custom analyzer leveraging that plugin:

curl -XPUT localhost:9200/phonetic -d '{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "filter": [
            "standard",
            "lowercase",
            "my_metaphone"
          ]
        }
      },
      "filter": {
        "my_metaphone": {
          "type": "phonetic",
          "encoder": "metaphone",
          "replace": true
        }
      }
    }
  }
}'

Now let's analyze your example using that new analyzer. As you can see, both plain and plane will produce the single token PLN:

curl -XGET 'localhost:9200/phonetic/_analyze?analyzer=my_analyzer&pretty' -d 'plane'
curl -XGET 'localhost:9200/phonetic/_analyze?analyzer=my_analyzer&pretty' -d 'plain'

{
  "tokens" : [ {
    "token" : "PLN",
    "start_offset" : 0,
    "end_offset" : 5,
    "type" : "<ALPHANUM>",
    "position" : 1
  } ]
}

Same thing for mail and male which produce the single token ML:

curl -XGET 'localhost:9200/phonetic/_analyze?analyzer=my_analyzer&pretty' -d 'mail'
curl -XGET 'localhost:9200/phonetic/_analyze?analyzer=my_analyzer&pretty' -d 'male'

{
  "tokens" : [ {
    "token" : "ML",
    "start_offset" : 0,
    "end_offset" : 4,
    "type" : "<ALPHANUM>",
    "position" : 1
  } ]
}

I've used the metaphone encoder, but you're free to use any other supported encoders. You can find more information on all supported encoders:

  • in the Apache Codec documentation for metaphone, double_metaphone, soundex, caverphone, caverphone1, caverphone2, refined_soundex, cologne, beider_morse
  • in the additional encoders for koelnerphonetik, haasephonetik and nysiis

Upvotes: 2

sean
sean

Reputation: 811

A solution which doesn't need a plugin is to use a Synonym Token Filter. Example:

{
"filter" : {
    "synonym" : {
        "type" : "synonym",
        "synonyms" : [
            "mail, male",
            "plane, plain"
        ]
    }
}

}

You can also put the synonyms in a text file and reference that, see the documentation I linked to for an example.

Upvotes: 0

Vineeth Mohan
Vineeth Mohan

Reputation: 19273

You can use the phonetic token filter for this purpose. Phonetic token filter is a plugin and it requires separate installation and setup. You can make use of this blog which explains in detail, how to set up and use phonetic token filter.

Upvotes: 1

Related Questions