Otávio Décio
Otávio Décio

Reputation: 74250

Keyword tokenizer vs not_analyzed

When specifying a filter using term, should the field always be not_analyzed or can it use the keyword analyzer? For example:

        "must_not": [
          {
            "term": {
              "personid": "ABADF00D-BEEF-4218-B59B-A164017A3BA0"
            }
          },

If I want to look for that personid case insensitive, I might use a keyword tokenizer with lowercase filter. But that seems to be causing it to not work when used as a term. Should I stick to the not_analyzed in this case?

Upvotes: 0

Views: 886

Answers (1)

Val
Val

Reputation: 217254

Declaring a field as not_analyzed is equivalent to using the keyword tokenizer without any other filters (i.e. no lowercase).

If you want to be able to search for that field in a case-insensitive way, yet still allow for term queries, you have two options.

Option A: Use a keyword tokenizer + lowercase token filter as you do now but make sure to lowercase the value in your term query, i.e.

    "must_not": [
      {
        "term": {
          "personid": "abadf00d-beef-4218-b59b-a164017a3ba0"
        }
      },

Option B: Use a keyword tokenizer + lowercase token filter as you do now (named your_analyzer below) but also add a sub-field raw which you declare as not_analyzed. So your mapping would basically look like this:

{
  "personid": {
    "type": "string",
    "analyzer": "your_analyzer",
    "fields": {
      "raw": {
        "type": "string",
        "index": "not_analyzed"
      }
    }
  }
}

And then you'd run your term query on the personid.raw subfield and if you want to search case-insensitively, then you'd run the query on the personid field

    "must_not": [
      {
        "term": {
          "personid.raw": "ABADF00D-BEEF-4218-B59B-A164017A3BA0"
        }
      },

Upvotes: 2

Related Questions