Reputation: 889
I have an analyzer which ignores whitespaces. When I search for a string without space, it returns proper results. This is the analyzer:
{
"index": {
"number_of_shards": 1,
"analysis": {
"filter": {
"word_joiner": {
"type": "word_delimiter",
"catenate_all": true
}
},
"analyzer": {
"word_join_analyzer": {
"type": "custom",
"filter": [
"word_joiner"
],
"tokenizer": "keyword"
}
}
}
}
}
This is how it works:
curl -XGET "http://localhost:9200/cake/_analyze?analyzer=word_join_analyzer&pretty" -d 'ONE"\ "TWO'
Result:
{
"tokens" : [ {
"token" : "ONE",
"start_offset" : 1,
"end_offset" : 5,
"type" : "word",
"position" : 0
}, {
"token" : "ONETWO",
"start_offset" : 1,
"end_offset" : 13,
"type" : "word",
"position" : 0
}, {
"token" : "TWO",
"start_offset" : 7,
"end_offset" : 13,
"type" : "word",
"position" : 1
} ]
}
What I want is that I also get a "token" : "ONE TWO"
from this analyzer. How can I do this?
Thanks!
Upvotes: 0
Views: 51
Reputation: 217274
You need to enable the preserve_original
setting, which is false by default
{
"index": {
"number_of_shards": 1,
"analysis": {
"filter": {
"word_joiner": {
"type": "word_delimiter",
"catenate_all": true,
"preserve_original": true <--- add this
}
},
"analyzer": {
"word_join_analyzer": {
"type": "custom",
"filter": [
"word_joiner"
],
"tokenizer": "keyword"
}
}
}
}
}
This will yield:
{
"tokens": [
{
"token": "ONE TWO",
"start_offset": 0,
"end_offset": 7,
"type": "word",
"position": 0
},
{
"token": "ONE",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 0
},
{
"token": "ONETWO",
"start_offset": 0,
"end_offset": 7,
"type": "word",
"position": 0
},
{
"token": "TWO",
"start_offset": 4,
"end_offset": 7,
"type": "word",
"position": 1
}
]
}
Upvotes: 2