altralaser
altralaser

Reputation: 2073

Query Elasticsearch index for words with and without accent

I query for the word "café" and get 20 articles. Then I repeat the search for the word "cafe" and will only get 3 articles. So I'm looking for a possibility to handle words with letters with accent in the same way like words with letters without accent.
My problem is also, that I already have a filled index so I have to modify an existing system. I'm using Elasticsearch 6.5.

I found some useful information and went through the following steps:

Setting up folding analyzer

curl -H "Content-Type: application/json" --user <user:pass> -XPUT http://localhost/test/_settings?pretty -d '{
  "analysis": {
    "analyzer": {
      "folding": {
        "tokenizer": "standard",
        "filter":  [ "lowercase", "asciifolding" ]
      }
    }
  }
}'

Modify existing mapping for the content field

curl -H "Content-Type: application/json" --user <user:pass> -XPUT http://localhost/test/mytype/_mapping -d '{
  "properties" : {
    "content" : {
      "type" : "text",
      "fields" : {
        "folded" : {
          "type" : "text",
          "analyzer" : "folding"
        }
      }
    }
  }
}'

Do the search

curl -H "Content-Type: application/json" --user <user:pass> -XGET http://localhost/test/_search -d '{
  "query" : {
    "bool" : {
      "must" : [
        {
          "query_string" : {
            "query" : "cafe"
          }
        }
      ]
    }
  },
  "size" : 10,
  "from" : 0
}'

But it's the same effect like before: I only find the articles with "cafe", not also the articles with "café". Is there something I miss?

Upvotes: 2

Views: 490

Answers (2)

Assael Azran
Assael Azran

Reputation: 2993

In your search query you should mention content.folded, folding analyzer is assigned to content.folded and not content.

After a mappings update you will have to reindex your data in order to apply the change.

Reindex step by step Reindex

A working example:

Mappings

PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "folding": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "content": {
          "type": "text",
          "fields": {
            "folded": {
              "type": "text",
              "analyzer": "folding"
            }
          }
        }
      }
    }
  }
}

Inserting few documents

POST my_index/_doc/1
{
  "content":"café"
}

POST my_index/_doc/2
{
  "content":"cafe"
}

Search Query

GET my_index/_search
{
  "query": {
    "match": {
      "content.folded": "cafe"
    }
  }
}

Results

"hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.18232156,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.18232156,
        "_source" : {
          "content" : "café"
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.18232156,
        "_source" : {
          "content" : "cafe"
        }
      }
    ]
  }

Hope this helps

Upvotes: 0

Val
Val

Reputation: 217254

Great start! You have created a new analyzer and changed your mapping, however, you also now need to reindex your data in order to fill in the new content.folded field.

You can do it very easily by calling the update by query endpoint like this:

curl --user <user:pass> -XPOST http://localhost/test/_update_by_query

Upvotes: 1

Related Questions