Reputation: 2073
I query for the word "café" and get 20 articles. Then I repeat the search for the word "cafe" and will only get 3 articles. So I'm looking for a possibility to handle words with letters with accent in the same way like words with letters without accent.
My problem is also, that I already have a filled index so I have to modify an existing system. I'm using Elasticsearch 6.5.
I found some useful information and went through the following steps:
Setting up folding analyzer
curl -H "Content-Type: application/json" --user <user:pass> -XPUT http://localhost/test/_settings?pretty -d '{
"analysis": {
"analyzer": {
"folding": {
"tokenizer": "standard",
"filter": [ "lowercase", "asciifolding" ]
}
}
}
}'
Modify existing mapping for the content field
curl -H "Content-Type: application/json" --user <user:pass> -XPUT http://localhost/test/mytype/_mapping -d '{
"properties" : {
"content" : {
"type" : "text",
"fields" : {
"folded" : {
"type" : "text",
"analyzer" : "folding"
}
}
}
}
}'
Do the search
curl -H "Content-Type: application/json" --user <user:pass> -XGET http://localhost/test/_search -d '{
"query" : {
"bool" : {
"must" : [
{
"query_string" : {
"query" : "cafe"
}
}
]
}
},
"size" : 10,
"from" : 0
}'
But it's the same effect like before: I only find the articles with "cafe", not also the articles with "café". Is there something I miss?
Upvotes: 2
Views: 490
Reputation: 2993
In your search query you should mention content.folded
, folding
analyzer is assigned to content.folded
and not content
.
After a mappings
update you will have to reindex your data in order to apply the change.
Reindex step by step Reindex
A working example:
Mappings
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"folding": {
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"content": {
"type": "text",
"fields": {
"folded": {
"type": "text",
"analyzer": "folding"
}
}
}
}
}
}
}
Inserting few documents
POST my_index/_doc/1
{
"content":"café"
}
POST my_index/_doc/2
{
"content":"cafe"
}
Search Query
GET my_index/_search
{
"query": {
"match": {
"content.folded": "cafe"
}
}
}
Results
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.18232156,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.18232156,
"_source" : {
"content" : "café"
}
},
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.18232156,
"_source" : {
"content" : "cafe"
}
}
]
}
Hope this helps
Upvotes: 0
Reputation: 217254
Great start! You have created a new analyzer and changed your mapping, however, you also now need to reindex your data in order to fill in the new content.folded
field.
You can do it very easily by calling the update by query endpoint like this:
curl --user <user:pass> -XPOST http://localhost/test/_update_by_query
Upvotes: 1