Andrew Butler
Andrew Butler

Reputation: 1060

Phonetic search results for integers with Elasticserach

Forgive me as I am new to Elasticsearch, but I am following the Phonetic start guide found here: Phonetic Matching

I have the following

POST /app
{
    "settings": {
        "index": {
            "analysis": {
                "filter": {
                    "dbl_metaphone": {
                        "type": "phonetic",
                        "encoder": "double_metaphone"
                    }
                },
                "analyzer": {
                    "dbl_metaphone": {
                        "tokenizer": "standard",
                        "filter": "dbl_metaphone"
                    }
                }
            }
        }
    },
    "mappings": {
        "movie": {
            "properties": {
                "title": {
                    "type": "string",
                    "fields": {
                        "phonetic": {
                            "type": "string",
                            "analyzer": "dbl_metaphone"
                        }
                    }
                },
                "year": {
                    "type": "string",
                    "fields": {
                        "phonetic": {
                            "type": "string",
                            "analyzer": "dbl_metaphone"
                        }
                    }
                }
            }
        }
    } }

I add some results by doing:

POST /app/movie
{ "title": "300", "year": 2006"} & { "title":"500 days of summer", "year": "2009" }

I want to query for the movie '300' by entering this query though:

POST /app/movie/_search
    {
        "query": {
            "match": {
                "title.phonetic": {
                    "query": "three hundred"
                }
            }
        }
    }

but I get no results. If change my query to "300" though it works just fine.

If I do:

GET /app/_analyze?analyzer=dbl_metaphone&text=300
{
  "tokens": [
    {
      "token": "300",
      "start_offset": 0,
      "end_offset": 3,
      "type": "<NUM>",
      "position": 0
    }
  ]
}

I see that there is only a number token returned not alphanumeric version like:

GET /app/_analyze?analyzer=dbl_metaphone&text=three hundred
{
  "tokens": [
    {
      "token": "0R",
      "start_offset": 0,
      "end_offset": 5,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "TR",
      "start_offset": 0,
      "end_offset": 5,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "HNTR",
      "start_offset": 6,
      "end_offset": 13,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

Is there something that I am missing with my phonetic query that I am supposed to define to get both the numerical and alphanumeric tokens?

Upvotes: 2

Views: 410

Answers (2)

Peter Dixon-Moses
Peter Dixon-Moses

Reputation: 3209

A better case for phonetic matching is finding "Judy Steinheiser" when the search query is [Jodi Stynehaser].


If you need to be able to search numbers using English, then you'll need to create some synonyms or alternate text at index-time, so that both "300" and "three hundred" are stored in Elasticsearch.

Shouldn't be too hard to find/write a function that converts integers to English.

Call your function when constructing your document to ingest into ES.

Alternately, write it in Groovy, and call it as a Transform script in your mapping.

Upvotes: 0

keety
keety

Reputation: 17441

That is not possible. Double Metaphone is a form of phonetic encoding algorithm. Simply put it tries to encode similarly pronounced words to the same key.

This facilitates to search for terms like names that could be spelt differently but sound the same.

As you can see from the algorithm double metaphone ignores numbers/numeric characters. You can read more about double metaphone here.

Upvotes: 1

Related Questions