Dhruv Pal
Dhruv Pal

Reputation: 957

Using normalizer with keyword data type in elastic search giving unexpected results

I created an index as such

PUT twitter
{
  "settings": {
    "index": {
      "analysis": {
        "normalizer": {
          "caseinsensitive_exact_match_normalizer": {
            "filter": "lowercase",
            "type": "custom"
          }
        },
        "analyzer": {
          "whitespace_lowercasefilter_analyzer": {
            "filter": "lowercase",
            "char_filter": "html_strip",
            "type": "custom",
            "tokenizer": "standard"
          }
        }
      }
    }
  },

  "mappings": {
    "test" : {
      "properties": {
        "col1" : {
          "type": "keyword"
        },
        "col2" : {
          "type": "keyword",
            "normalizer": "caseinsensitive_exact_match_normalizer"
        }
      } 
    }

  }
}

then I inserted values in index as

POST twitter/test
{
  "col1" : "Dhruv",
  "col2" : "Dhruv"
}

then I query index as

GET twitter/_search
{
  "query": {
    "term": {
      "col2": {
        "value": "DHRUV"
      }
    }
  }
}

and I get the results

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "twitter",
        "_type": "test",
        "_id": "AV9yNWQb3aJEm8NgRhd_",
        "_score": 0.2876821,
        "_source": {
          "col1": "Dhruv",
          "col2": "Dhruv"
        }
      }
    ]
  }
}

as per my understaning, we should not get a result since term query ignores the analysis so it should search for DHRUVin inverted index and in index value stored should be dhruv since we used caseinsensitive_exact_match_normalizer. I am suspecting that term query doesn't ignore normalizer. Is that right?

I am using ES 5.4.1

Upvotes: 4

Views: 2242

Answers (1)

Andrei Stefan
Andrei Stefan

Reputation: 52368

It seems it's normal for a term query to consider the normalizer when searching. But, as the issue linked previously, it's been decided this is not the expected behavior.

If you want to see what kind of query ES is rewritting yours to, you can use something like this:

GET /_validate/query?index=twitter&explain
{
  "query": {
    "term": {
      "col2": {
        "value": "DHRUV"
      }
    }
  }
}

which will show you why you get those results:

  "explanations": [
    {
      "index": "twitter",
      "valid": true,
      "explanation": "col2:dhruv"
    }
  ]

Upvotes: 4

Related Questions