Reputation: 13
I have an index which is using a whitespace analyzer - see below:
{
"my-index": {
"settings": {
"index": {
"number_of_shards": "15",
"provided_name": "my-index",
"creation_date": "1638550619099",
"analysis": {
"normalizer": {
"lowercase_normalizer": {
"filter": [
"lowercase",
"asciifolding"
],
"type": "custom",
"char_filter": []
}
},
"analyzer": {
"my_analyzer": {
"filter": [
"lowercase"
],
"char_filter": [],
"tokenizer": "whitespace"
}
}
},
"number_of_replicas": "1",
"uuid": "WrteqKeaTwuGGEXOpckwQw",
"version": {
"created": "7090199"
}
}
}
}
}
I can confirm the analyzer outputs as expected for text with special characters:
!curl -X GET "https://xxx/my-index/_analyze?pretty" -H "Content-Type: application/json" -d'{"analyzer": "my_analyzer","text" : ["This - is - an item"]}'
{
"tokens" : [
{
"token" : "This",
"start_offset" : 0,
"end_offset" : 4,
"type" : "word",
"position" : 0
},
{
"token" : "-",
"start_offset" : 5,
"end_offset" : 6,
"type" : "word",
"position" : 1
},
{
"token" : "is",
"start_offset" : 7,
"end_offset" : 9,
"type" : "word",
"position" : 2
},
{
"token" : "-",
"start_offset" : 10,
"end_offset" : 11,
"type" : "word",
"position" : 3
},
{
"token" : "an",
"start_offset" : 12,
"end_offset" : 14,
"type" : "word",
"position" : 4
},
{
"token" : "item",
"start_offset" : 15,
"end_offset" : 19,
"type" : "word",
"position" : 5
}
]
}
However, when specifying a wildcard query with a special character, in this case "-", I'm unable to get any results back
my_query = {
"query": {
"bool":{
"must":[
{
"wildcard":{
"ec_item_name":{
"value":"-*"
}
}
}
]
}
}
}
I understand that wildcard queries are not analyzed, but I am not understanding how this could apply here anyway. If the whitespace analyzer is specified at index time and is identifying "-" as a word, how is the wildcard query unable to match? It doesn't seem to be a problem for alphanumeric values
Upvotes: 0
Views: 1051
Reputation: 16192
You are almost there. You have properly applied the index settings, as well as the analyzer, is properly defined.
I believe you have missed adding the analyzer to the field ec_item_name
in the properties of the mapping, due to which the analyzer is not applied to the field.
The index mapping and settings should be:
{
"settings": {
"index": {
"number_of_shards": "15",
"analysis": {
"normalizer": {
"lowercase_normalizer": {
"filter": [
"lowercase",
"asciifolding"
],
"type": "custom",
"char_filter": []
}
},
"analyzer": {
"my_analyzer": {
"filter": [
"lowercase"
],
"char_filter": [],
"tokenizer": "whitespace"
}
}
},
"number_of_replicas": "1"
}
},
"mappings": {
"properties": {
"ec_item_name": {
"type": "text",
"analyzer": "my_analyzer" // note this
}
}
}
}
Index Data
{
"ec_item_name": ["This - is - an item"]
}
Search Query:
{
"query": {
"bool": {
"must": [
{
"wildcard": {
"ec_item_name": {
"value": "-*"
}
}
}
]
}
}
}
Search Result:
"hits": [
{
"_index": "70218546",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"ec_item_name": [
"This - is - an item"
]
}
}
]
Upvotes: 0