shantanuo
shantanuo

Reputation: 32316

analyzer for spelling mistakes

I have saved the user inputs directly in elastcisearch. The name field has various spelling combinations for the same student.

PrabhuNath Prasad
PrabhuNathPrasad
Prabhu NathPrasad

Prabhu Nath Prashad
PrabhuNath Prashad
PrabhuNathPrashad
Prabhu NathPrashad

The real name of the student is "Prabhu Nath Prasad" and when I search by that name, I should get all the above results back. Is there any analyzer in elasticsearch that can take care of it?

Upvotes: 2

Views: 2082

Answers (2)

ChintanShah25
ChintanShah25

Reputation: 12672

You could do that custom_analyzer, This is my setup

POST name_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": {
          "char_filter": [
            "space_removal"
          ],
          "tokenizer": "keyword",
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      },
      "char_filter": {
        "space_removal": {
          "type": "pattern_replace",
          "pattern": "\\s+",
          "replacement": ""
        }
      }
    }
  },
  "mappings": {
    "your_type": {
      "properties": {
        "name": {
          "type": "string",
          "fields": {
            "variation": {
              "type": "string",
              "analyzer": "my_custom_analyzer"
            }
          }
        }
      }
    }
  }
}

I have mapped name with both standard analyzer and custom_analyzer which uses keyword tokenizer and lowercase filter along with char_filter which removes space and joins the string. This char_filter will help us query different variations effectively.

I inserted all those 7 combinations you have given in index. This is my query

GET name_index/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "name": "Prabhu Nath Prasad"
          }
        },
        {
          "match": {
            "name.variation": {
              "query": "Prabhu Nath Prasad",
              "fuzziness": "AUTO"
            }
          }
        }
      ]
    }
  }
}

This handles all your possibilities and it will also give back prabhu, prasad etc.

Hope this helps!!

Upvotes: 5

Anirudh Modi
Anirudh Modi

Reputation: 1829

There is no analyzer for that however, what you can look into is the "fuzzy"..

In your query specify the fuzziness which can help you in getting the above record.

I will Suggest you to go through the links below

https://www.elastic.co/blog/found-fuzzy-search

https://www.elastic.co/guide/en/elasticsearch/guide/current/fuzzy-match-query.html

https://www.elastic.co/guide/en/elasticsearch/guide/current/fuzziness.html

This will help you achieve what you want.

Also there wont be any straight way to get the record if the user have typed "PrabhuNath", because elastic will treat it as a single token, however you can use "phrase_prefix" query which help you fetch records while the user is typing..

Your query will look like this to get the basic spelling mistake

{
  "query": {
    "match": {
      "name": {
        "query":"PrabhuNath Prasad",
        "fuzziness": 2
      }
    }
  }
}

Upvotes: 2

Related Questions