rgov
rgov

Reputation: 4329

Wildcard query only returns results if query is exactly "*"

I'm using Elasticsearch 6.7.0, and I'm trying to make a wildcard query, say to select documents where the field datafile_url ends with .RLF.

To start with a simple query, I just use the wildcard * to query for any value:

GET data/_search
{
  "query": {
     "wildcard": {
        "datafile_url": "*"
     }
  }
}

This returns documents, such as this one:

{
  "_index" : "data",
  "_type" : "doc",
  "_id" : "1HzJaWoBVj7X61Ih767N",
  "_score" : 1.0,
  "_source" : {
    "datafile_url" : "/uploads/data/1/MSN001.RLF",
    ...
  }
},

Ok, great. But when I change the wildcard query to *.RLF, I get no results.

Upvotes: 0

Views: 369

Answers (1)

Polynomial Proton
Polynomial Proton

Reputation: 5135

Short Answer: That is because elastic applies Standard Analyzer when the default analyzer is not explicitly specified for a field.

If you do a wild card search on the keyword, it will work and return expected result:

GET data/_search
{
  "query": {
     "wildcard": {
        "datafile_url.keyword": "*.RLF"
     }
  }
}

Now, for some background on why it doesnt work without .keyword

Take a look at this example and try running it on your own index.

POST data/_analyze
{
  "field": "datafile_url",
  "text" : "/uploads/data/1/MSN001.RLF"
}

#Result

{
  "tokens": [
    {
      "token": "uploads",
      "start_offset": 1,
      "end_offset": 8,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "data",
      "start_offset": 9,
      "end_offset": 13,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "1",
      "start_offset": 14,
      "end_offset": 15,
      "type": "<NUM>",
      "position": 2
    },
    {
      "token": "msn001",
      "start_offset": 16,
      "end_offset": 22,
      "type": "<ALPHANUM>",
      "position": 3
    },
    {
      "token": "rlf",
      "start_offset": 23,
      "end_offset": 26,
      "type": "<ALPHANUM>",
      "position": 4
    }
  ]
}

Notice how all special characters are missing in the inverted index. Your wild card search will only work on any of the above words from the inverted index. for example:

#this will work
    GET data/_search
    {
      "query": {
         "wildcard": {
            "datafile_url": "*rlf"
         }
      }
    }

#this will NOT work because of case sensitive inverted index.
    GET data/_search
    {
      "query": {
         "wildcard": {
            "datafile_url": "*RLF"
         }
      }
    }

You would need to write a custom analyzer if you wan to preserve those special characters.

Upvotes: 1

Related Questions