Reputation: 63
I would like to match words that spells different, but have the same pronounciation. Like "mail" and "male", "plane" and "plain". Can we do such a matching in elasticsearch?
Upvotes: 3
Views: 168
Reputation: 217424
You can use the analysis phonetic plugin for that task.
Let's create an index with a custom analyzer leveraging that plugin:
curl -XPUT localhost:9200/phonetic -d '{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"standard",
"lowercase",
"my_metaphone"
]
}
},
"filter": {
"my_metaphone": {
"type": "phonetic",
"encoder": "metaphone",
"replace": true
}
}
}
}
}'
Now let's analyze your example using that new analyzer. As you can see, both plain
and plane
will produce the single token PLN
:
curl -XGET 'localhost:9200/phonetic/_analyze?analyzer=my_analyzer&pretty' -d 'plane'
curl -XGET 'localhost:9200/phonetic/_analyze?analyzer=my_analyzer&pretty' -d 'plain'
{
"tokens" : [ {
"token" : "PLN",
"start_offset" : 0,
"end_offset" : 5,
"type" : "<ALPHANUM>",
"position" : 1
} ]
}
Same thing for mail
and male
which produce the single token ML
:
curl -XGET 'localhost:9200/phonetic/_analyze?analyzer=my_analyzer&pretty' -d 'mail'
curl -XGET 'localhost:9200/phonetic/_analyze?analyzer=my_analyzer&pretty' -d 'male'
{
"tokens" : [ {
"token" : "ML",
"start_offset" : 0,
"end_offset" : 4,
"type" : "<ALPHANUM>",
"position" : 1
} ]
}
I've used the metaphone
encoder, but you're free to use any other supported encoders. You can find more information on all supported encoders:
metaphone
, double_metaphone
, soundex
, caverphone
, caverphone1
, caverphone2
, refined_soundex
, cologne
, beider_morse
koelnerphonetik
, haasephonetik
and nysiis
Upvotes: 2
Reputation: 811
A solution which doesn't need a plugin is to use a Synonym Token Filter. Example:
{
"filter" : {
"synonym" : {
"type" : "synonym",
"synonyms" : [
"mail, male",
"plane, plain"
]
}
}
}
You can also put the synonyms in a text file and reference that, see the documentation I linked to for an example.
Upvotes: 0
Reputation: 19273
You can use the phonetic token filter for this purpose. Phonetic token filter is a plugin and it requires separate installation and setup. You can make use of this blog which explains in detail, how to set up and use phonetic token filter.
Upvotes: 1