samidarko
samidarko

Reputation: 632

Filtering with term on tag key with a value containing a dot do not return any result

I'm using ElasticSearch 5.2

My query is:

POST /_search
{
  "query": {
    "bool": { 
      "filter": [ 
        { "term":  { "tag": "server-dev.user-log" }} 
      ]
    }
  }
}

I can filter with a tag value like abcd but it seems I cannot with ab.cd

I guess this is because of the Tokenizer. Is there a way to say like strict equivalence? or if it comes from the . to escape it?

The tag mapping is:

"tag": {
  "type": "text",
  "fields": {
    "keyword": {
      "type": "keyword",
      "ignore_above": 256
    }
  }
},

Upvotes: 0

Views: 83

Answers (2)

samidarko
samidarko

Reputation: 632

Finally I've been able to make it work like:

POST /_search
{
  "query": {
    "bool": { 
      "filter": [ 
        { "terms":  { "tag": ["server", "dev.user", "log"] }} 
      ]
    }
  }
}

It seems the - is a token delimiter

I just want to add that my configuration is very standard. I didn't modify the mapping. The mapping is created by fluentd.

=======> EDIT <=======

If you replace tag by tag.keyword you don't need to do the above solution anymore (which btw do not work with any value)

POST /_search
{
  "query": {
    "bool": { 
      "filter": [ 
        { "term":  { "tag.keyword": "server-dev.user-log" }} 
      ]
    }
  }
}

Upvotes: 0

Mysterion
Mysterion

Reputation: 9320

Most likely you have a Standard analyzer for your field tag, and for a token server-dev.user-log, following tokens will be provided:

{
    "tokens": [
        {
            "token": "server",
            "start_offset": 0,
            "end_offset": 6,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "dev.user",
            "start_offset": 7,
            "end_offset": 15,
            "type": "<ALPHANUM>",
            "position": 1
        },
        {
            "token": "log",
            "start_offset": 16,
            "end_offset": 19,
            "type": "<ALPHANUM>",
            "position": 2
        }
    ]
}

and, that's the reason, you do not have match, so things that should fix it, is to add mapping for a field tag, with tokenizer which will preserve whole token. The simplest choice is KeywordAnalyzer, with settings for an index like this:

{
      "settings": {
        "analysis": {
          "analyzer": {
            "my_analyzer": {
              "tokenizer": "my_tokenizer"
            }
          },
          "tokenizer": {
            "my_tokenizer": {
              "type": "keyword"
            }
          }
        }
      },
      "mappings": {
        "my_type": {
          "properties": {
            "text": {
              "type": "tag",
              "analyzer": "my_analyzer"
            }
          }
        }
      }
    }

Upvotes: 1

Related Questions