Ryanmc174
Ryanmc174

Reputation: 13

Elasticsearch Wildcard Query not matching special characters with whitespace analyzer

I have an index which is using a whitespace analyzer - see below:

    {
  "my-index": {
    "settings": {
      "index": {
        "number_of_shards": "15",
        "provided_name": "my-index",
        "creation_date": "1638550619099",
        "analysis": {
          "normalizer": {
            "lowercase_normalizer": {
              "filter": [
                "lowercase",
                "asciifolding"
              ],
              "type": "custom",
              "char_filter": []
            }
          },
          "analyzer": {
            "my_analyzer": {
              "filter": [
                "lowercase"
              ],
              "char_filter": [],
              "tokenizer": "whitespace"
            }
          }
        },
        "number_of_replicas": "1",
        "uuid": "WrteqKeaTwuGGEXOpckwQw",
        "version": {
          "created": "7090199"
        }
      }
    }
  }
}

I can confirm the analyzer outputs as expected for text with special characters:

!curl -X GET "https://xxx/my-index/_analyze?pretty" -H "Content-Type: application/json" -d'{"analyzer": "my_analyzer","text" : ["This - is - an item"]}'
{
  "tokens" : [
    {
      "token" : "This",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "-",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "is",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "-",
      "start_offset" : 10,
      "end_offset" : 11,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "an",
      "start_offset" : 12,
      "end_offset" : 14,
      "type" : "word",
      "position" : 4
    },
    {
      "token" : "item",
      "start_offset" : 15,
      "end_offset" : 19,
      "type" : "word",
      "position" : 5
    }
  ]
}

However, when specifying a wildcard query with a special character, in this case "-", I'm unable to get any results back

    my_query = {
    "query": {
       "bool":{
          "must":[
             {
                "wildcard":{
                   "ec_item_name":{
                      "value":"-*"
                   }
                }
             }
          ]
       }
    }
}

I understand that wildcard queries are not analyzed, but I am not understanding how this could apply here anyway. If the whitespace analyzer is specified at index time and is identifying "-" as a word, how is the wildcard query unable to match? It doesn't seem to be a problem for alphanumeric values

Upvotes: 0

Views: 1051

Answers (1)

Bhavya
Bhavya

Reputation: 16192

You are almost there. You have properly applied the index settings, as well as the analyzer, is properly defined.

I believe you have missed adding the analyzer to the field ec_item_name in the properties of the mapping, due to which the analyzer is not applied to the field.

The index mapping and settings should be:

{
  "settings": {
    "index": {
      "number_of_shards": "15",
      "analysis": {
        "normalizer": {
          "lowercase_normalizer": {
            "filter": [
              "lowercase",
              "asciifolding"
            ],
            "type": "custom",
            "char_filter": []
          }
        },
        "analyzer": {
          "my_analyzer": {
            "filter": [
              "lowercase"
            ],
            "char_filter": [],
            "tokenizer": "whitespace"
          }
        }
      },
      "number_of_replicas": "1"
    }
  },
  "mappings": {
    "properties": {
      "ec_item_name": {
        "type": "text",
        "analyzer": "my_analyzer"   // note this
      }
    }
  }
}

Index Data

{
  "ec_item_name": ["This - is - an item"]
}

Search Query:

{
  "query": {
    "bool": {
      "must": [
        {
          "wildcard": {
            "ec_item_name": {
              "value": "-*"
            }
          }
        }
      ]
    }
  }
}

Search Result:

"hits": [
      {
        "_index": "70218546",
        "_type": "_doc",
        "_id": "1",
        "_score": 1.0,
        "_source": {
          "ec_item_name": [
            "This - is - an item"
          ]
        }
      }
    ]

Upvotes: 0

Related Questions