Reputation: 699
I'm facing some problems in the wildcard query search.
My purpose is that if I search for word1 word2 word3
, I'll find all the results that can have prefixes and suffixes before and after each word that compose the entire string.
The structure of my index is:
{
"my_index": {
"aliases": {},
"mappings": {
"properties": {
"attributes": {
"properties": {
"name": {
"properties": {
"value": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
}
}
}
},
"settings": {
...
}
}
}
So I have a field attributes.name
(text) where I want to match values.
My index contains objects where attributes.name
values are:
word1
,word1suffix
,word1 word2
,word1 word2suffix
word1 word2 word3
.Before running the search, I internally add wildcards before and after each word:
word1 word2 word3
=> *word1* *word2* *word3*
Then I run this query:
{
"size": 10,
"index": "my_index",
"body": {
"query": {
"bool": {
"should": [
{
"wildcard": {
"attributes.name.value": {
"value": "*word1* *word2* *word3*",
"rewrite": "constant_score"
}
}
}
],
"must": [],
"minimum_should_match": 1
}
}
},
"explain": false
}
The strange thing I'm facing is that, even if in the index I have exactly an object called word1 word2 word3
, I cannot find it through this kind of wildcard search (I know that in that case, it's better a match_phrase or term query, but it's just to understand why this simple case is not working).
If I try with less words like:
*word1*
, I find both word1
, word1suffix
, word1 word2
and word1 word2suffix
*word1* *word2*
, I find both word1 word2
and word1 word2suffix
*word1* *word2* *word3*
, noneSo it seems that this strange behavior starts when I search for results with too many words.
Just for debug, my values are stored in the index in this way:
{
"attributes": {
"name": [{
"value": "word1 word2 word3"
}],
}
}
Last consideration: I managed to find that word1 word2 word3
by searching in the field attributes.name.value.keyword
(I think .keyword
is automatically generated in the index on every textual fields) rather than attributes.name.value
. The problem is that if I use .keyword
the analyzer doesn't work, so I think it's not a good solution.
Upvotes: 3
Views: 7660
Reputation: 5486
A wildcard query work based on the pattern, so it will consider entire query as one pattern and due to that may be it is not matching when you add multiple words.
You have two options:
First is using query_string
type of query as shown below, you can set the value of default_operator
to AND
or OR
based on requirements. This will create bool
query internally only:
{
"query": {
"bool": {
"should": [
{
"query_string": {
"default_field": "value",
"query": "*word1* *word2* *word3*",
"default_operator": "AND"
}
}
]
}
}
}
Second, you can have multiple wildcard
query inside must
for AND
query and inside should
for OR
query terms:
{
"query": {
"bool": {
"must": [
{
"wildcard": {
"value": {
"value": "*word1*"
}
}
},
{
"wildcard": {
"value": {
"value": "*word2*"
}
}
},
{
"wildcard": {
"value": {
"value": "*word3*"
}
}
}
]
}
}
}
Update
I managed to find that
word1 word2 word3
by searching in the fieldattributes.name.value.keyword
(I think.keyword
is automatically generated in the index on every textual fields) rather thanattributes.name.value
. The problem is that if I use.keyword
the analyzer doesn't work, so I think it's not a good solution.
Yes, If you not configured mapping
then elastic will auto create mapping for each field and if the field is found as text
type, then it create an inner field with keyword
type as well.
It is working because keyword
field does not apply any analyzer and it looks for exact match. If you try wildcard
query for attributes.name.value.keyword
field with multiple terms then it will work, but it is case sensitive. So if you have field value like word1 word2 word3
then *word1* *word2* *word3*
this query will work, but *Word1* *word2* *word3*
this query will not work. (See W
is capital).
Why it is not working on text
type field ?
Because wildcard
query is the term level query and it don't apply any analyzer
at query time. It will consider your entire query as one pattern. You are matching query on text
type field which used standard
analyzer at indexing time and token your text to multiple terms and index hence it is working for one term and not multiple term.
Performance Impact
It is not recommended to use wildcard which is starting with *
or ?
As it impact search performance. Below is what mentioned in a document as a warning:
Avoid beginning patterns with * or ?. This can increase the iterations needed to find matching terms and slow search performance.
Upvotes: 7