Reputation: 4329
I'm using Elasticsearch 6.7.0, and I'm trying to make a wildcard query, say to select documents where the field datafile_url
ends with .RLF
.
To start with a simple query, I just use the wildcard *
to query for any value:
GET data/_search
{
"query": {
"wildcard": {
"datafile_url": "*"
}
}
}
This returns documents, such as this one:
{
"_index" : "data",
"_type" : "doc",
"_id" : "1HzJaWoBVj7X61Ih767N",
"_score" : 1.0,
"_source" : {
"datafile_url" : "/uploads/data/1/MSN001.RLF",
...
}
},
Ok, great. But when I change the wildcard query to *.RLF
, I get no results.
Upvotes: 0
Views: 369
Reputation: 5135
Short Answer: That is because elastic applies Standard Analyzer when the default analyzer is not explicitly specified for a field.
If you do a wild card search on the keyword, it will work and return expected result:
GET data/_search
{
"query": {
"wildcard": {
"datafile_url.keyword": "*.RLF"
}
}
}
Now, for some background on why it doesnt work without .keyword
Take a look at this example and try running it on your own index.
POST data/_analyze
{
"field": "datafile_url",
"text" : "/uploads/data/1/MSN001.RLF"
}
#Result
{
"tokens": [
{
"token": "uploads",
"start_offset": 1,
"end_offset": 8,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "data",
"start_offset": 9,
"end_offset": 13,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "1",
"start_offset": 14,
"end_offset": 15,
"type": "<NUM>",
"position": 2
},
{
"token": "msn001",
"start_offset": 16,
"end_offset": 22,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "rlf",
"start_offset": 23,
"end_offset": 26,
"type": "<ALPHANUM>",
"position": 4
}
]
}
Notice how all special characters are missing in the inverted index. Your wild card search will only work on any of the above words from the inverted index. for example:
#this will work
GET data/_search
{
"query": {
"wildcard": {
"datafile_url": "*rlf"
}
}
}
#this will NOT work because of case sensitive inverted index.
GET data/_search
{
"query": {
"wildcard": {
"datafile_url": "*RLF"
}
}
}
You would need to write a custom analyzer if you wan to preserve those special characters.
Upvotes: 1