Paolo Magnani
Paolo Magnani

Reputation: 699

Wildcard query Elasticsearch doesn't work with multi-words values

I'm facing some problems in the wildcard query search.

My purpose is that if I search for word1 word2 word3, I'll find all the results that can have prefixes and suffixes before and after each word that compose the entire string.

The structure of my index is:

{
  "my_index": {
    "aliases": {},
    "mappings": {
      "properties": {
        "attributes": {
          "properties": {
            "name": {
              "properties": {
                "value": {
                  "type": "text",
                  "fields": {
                    "keyword": {
                      "type": "keyword",
                      "ignore_above": 256
                    }
                  }
                }
              }
            },
          }
        }
      }
    },
    "settings": {
      ...
    }
  }
}

So I have a field attributes.name(text) where I want to match values.

My index contains objects where attributes.name values are:

Before running the search, I internally add wildcards before and after each word:

word1 word2 word3 => *word1* *word2* *word3*

Then I run this query:

{
  "size": 10,
  "index": "my_index",
  "body": {
    "query": {
      "bool": {
        "should": [
          {
            "wildcard": {
              "attributes.name.value": {
                "value": "*word1* *word2* *word3*",
                "rewrite": "constant_score"
              }
            }
          }
        ],
        "must": [],
        "minimum_should_match": 1
      }
    }
  },
  "explain": false
}

The strange thing I'm facing is that, even if in the index I have exactly an object called word1 word2 word3, I cannot find it through this kind of wildcard search (I know that in that case, it's better a match_phrase or term query, but it's just to understand why this simple case is not working).

If I try with less words like:

So it seems that this strange behavior starts when I search for results with too many words.

Just for debug, my values are stored in the index in this way:

{
    "attributes": {
        "name": [{
            "value": "word1 word2 word3"
        }],
    }
}

Last consideration: I managed to find that word1 word2 word3 by searching in the field attributes.name.value.keyword (I think .keyword is automatically generated in the index on every textual fields) rather than attributes.name.value. The problem is that if I use .keyword the analyzer doesn't work, so I think it's not a good solution.

Upvotes: 3

Views: 7660

Answers (1)

Sagar Patel
Sagar Patel

Reputation: 5486

A wildcard query work based on the pattern, so it will consider entire query as one pattern and due to that may be it is not matching when you add multiple words.

You have two options:

First is using query_string type of query as shown below, you can set the value of default_operator to AND or OR based on requirements. This will create bool query internally only:

{
 "query": {
   "bool": {
     "should": [
       {
        "query_string": {
          "default_field": "value",
          "query": "*word1* *word2* *word3*",
          "default_operator": "AND"
        }
       }
     ]
   }
 }
}

Second, you can have multiple wildcard query inside must for AND query and inside should for OR query terms:

{
  "query": {
    "bool": {
      "must": [
        {
          "wildcard": {
            "value": {
              "value": "*word1*"
            }
          }
        },
        {
          "wildcard": {
            "value": {
              "value": "*word2*"
            }
          }
        },
        {
          "wildcard": {
            "value": {
              "value": "*word3*"
            }
          }
        }
      ]
    }
  }
}

Update

I managed to find that word1 word2 word3 by searching in the field attributes.name.value.keyword (I think .keyword is automatically generated in the index on every textual fields) rather than attributes.name.value. The problem is that if I use .keyword the analyzer doesn't work, so I think it's not a good solution.

Yes, If you not configured mapping then elastic will auto create mapping for each field and if the field is found as text type, then it create an inner field with keyword type as well.

It is working because keyword field does not apply any analyzer and it looks for exact match. If you try wildcard query for attributes.name.value.keyword field with multiple terms then it will work, but it is case sensitive. So if you have field value like word1 word2 word3 then *word1* *word2* *word3* this query will work, but *Word1* *word2* *word3* this query will not work. (See W is capital).

Why it is not working on text type field ?

Because wildcard query is the term level query and it don't apply any analyzer at query time. It will consider your entire query as one pattern. You are matching query on text type field which used standard analyzer at indexing time and token your text to multiple terms and index hence it is working for one term and not multiple term.

Performance Impact

It is not recommended to use wildcard which is starting with * or ? As it impact search performance. Below is what mentioned in a document as a warning:

Avoid beginning patterns with * or ?. This can increase the iterations needed to find matching terms and slow search performance.

Upvotes: 7

Related Questions