Reputation: 5088
I have my elastic search data stored in the following format:
{
"person_name": "Abraham Benjamin deVilliers",
"name": "Abraham",
"office": {
"name": "my_office"
}
},
{
"person_name": "Johnny O'Ryan",
"name": "O'Ryan",
"office": {
"name": "Johnny O'Ryan"
}
},
......
And i have match query to search based on person_name
,name
and office.name
as follows:
{
"query": {
"multi_match" : {
"query": "O'Ryan",
"type": "best_fields",
"fields": [ "person_name", "name", "office.name" ],
"operator":"and"
}
}
}
And its working fine and i am getting expected result for query fields which are exactly matching the name
or person_name
or office.name
as below.
{
"person_name": "Johnny O'Ryan",
"name": "O'Ryan",
"office": {
"name": "Johnny O'Ryan"
}
}
Now i want to enable the search to return the same response when the user passes the query field ORyan
instead if O'Ryan
, ignoring the Single quote (')
from the stored result.
Is there is a way to do it while doing elastic search query or do i need to ignore the special characters while storing the data in elastic search ?.
Any help will be appreciated.
Upvotes: 1
Views: 229
Reputation: 334
What you are looking for is a tokenizer: Tokenizers
In your case, you can try something like
GET /_analyze
{
"tokenizer": "letter",
"filter":[],
"text" : "O'Ryan is good"
}
It will produce the following tokens:
{
"tokens": [
{
"token": "O",
"start_offset": 0,
"end_offset": 1,
"type": "word",
"position": 0
},
{
"token": "Ryan",
"start_offset": 2,
"end_offset": 6,
"type": "word",
"position": 1
},
{
"token": "is",
"start_offset": 7,
"end_offset": 9,
"type": "word",
"position": 2
},
{
"token": "good",
"start_offset": 10,
"end_offset": 14,
"type": "word",
"position": 3
}
]
}
Update:
You could also add a Mapping Char Filter to the analyzer used on the name fields (or whatever field in which the single quotation is a problem:
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"char_filter": [
"my_char_filter"
]
}
},
"char_filter": {
"my_char_filter": {
"type": "mapping",
"mappings": [
"' => "
]
}
}
}
}
}
If you run:
POST my_index/_analyze
{
"analyzer": "my_analyzer",
"text": "O'Bryan is a good"
}
You will get:
{
"tokens": [
{
"token": "OBryan",
"start_offset": 0,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "is",
"start_offset": 8,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "a",
"start_offset": 11,
"end_offset": 12,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "good",
"start_offset": 13,
"end_offset": 17,
"type": "<ALPHANUM>",
"position": 3
}
]
}
Upvotes: 1