Reputation: 3651
Im trying to get Elastic Search making a phonetic search in a list of cities. My goal is to find matching results even if the user uses an incorrect spelling.
I've done the following steps:
Remove domain
curl -X DELETE "localhost:9200/city/"
Create new domain
curl -X PUT "localhost:9200/city/?pretty" -H 'Content-Type: application/json' -d'
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_metaphone"
]
}
},
"filter": {
"my_metaphone": {
"type": "phonetic",
"encoder": "metaphone",
"replace": true
}
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}'
Fill some sample data
curl -X PUT "localhost:9200/city/_doc/1?pretty" -H 'Content-Type: application/json' -d'
{
"name":"Mayrhofen"
}
'
curl -X PUT "localhost:9200/city/_doc/2?pretty" -H 'Content-Type: application/json' -d'
{
"name":"Ischgl"
}
'
curl -X PUT "localhost:9200/city/_doc/3?pretty" -H 'Content-Type: application/json' -d'
{
"name":"Saalbach"
}
'
Search in the cities - here I get an result
curl -X GET ""localhost:9200/city/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query":{
"query_string":{
"query":"Mayrhofen"
}
}
}
'
I tried the query with Mayerhofen and expected the same result as using Mayrhofen. The same issue with Ischgl and Ichgl or Saalbach and Salbach.
Where's my error? Is something mssing?
Upvotes: 0
Views: 1851
Reputation: 22974
Problem is that you are using wrong encoder
. metaphone
cannot match those.
What you need to use is double_metaphone
for your inputs. It's based on phonetic algorithm implementation. I would suggest you to understand your data and algorithm to ensure whether the phonetic algorithm is best fit for your purpose.
Mapping:
{
"analysis": {
"analyzer": {
"double_meta_true_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"true_doublemetaphone"
]
}
},
"filter": {
"true_doublemetaphone": {
"type": "phonetic",
"encoder": "double_metaphone",
"replace": true
}
}
}
}
It matches the docs.
Why metaphone is not matching:
GET http://localhost:9200/city2/_analyze
{
"field":"meta_true",
"text":"Mayrhofen"
}
yields
{
"tokens": [
{
"token": "MRHF",
"start_offset": 0,
"end_offset": 9,
"type": "<ALPHANUM>",
"position": 0
}
]
}
And analysing below
{
"field":"meta_true",
"text":"Mayerhofen"
}
yields
{
"tokens": [
{
"token": "MYRH",
"start_offset": 0,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 0
}
]
}
Double_Metaphone works the below way:
GET
{
"field":"doublemeta_true",
"text":"Mayerhofen"
}
And
{
"field":"doublemeta_true",
"text":"Mayerhofen"
}
and
{
"field":"doublemeta_true",
"text":"Mayrhofen"
}
yields
{
"tokens": [
{
"token": "MRFN",
"start_offset": 0,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 0
}
]
}
Upvotes: 1