Reputation: 5095
When I make this query:
curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"regexp":{
"main_text": ".*word r.*"
}
}
}
'
I get no results. But when I query:
curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"regexp":{
"main_text": ".*word.*"
}
}
}
'
I get results with word (including results with "word r..."). I am using Elasticsearch 6.2.2. Any idea what is going on?
Upvotes: 2
Views: 316
Reputation: 8860
Let's say you have the below sentence
word raincoat bword wordcd
If the field main_text
is of type text
and if it uses default i.e. Standard Analyzer
, then the sentence would be broken into below tokens
word
raincoat
bword
wordcd
(Yup no spaces)
Now these words are actually which are stored in inverted index and when you query using match or even regex, it would try to match to these words.
Note that it doesn't save sentence as is for e.g. "word raincoat"
nor it is saved as "word "
(notice the space) in inverted index.
Now you are using regex .*word.*
you would get documents having word
, bword
and wordcd
'coz that's what your inverted index has.
Again now when you use regex .*word r*
, since inverted index doesn't save the "word raincoat"
together, you wouldn't get the result.
What you can do is, have the field main_text
of type keyword
, in this case datatype keyword
doesn't go through the analysis phase and therefore keeps the entire value saved as is in inverted index. Your regex *.word r.*
, would then work as expected.
You always search inverted index, so you would get only what inverted index stores
In case if you need both partial search as well as exact search implementation, then I'd suggest you make use of multi-field for main_text
or whatever field name you intend to.
Hope this helps!
Upvotes: 1
Reputation: 46
This is becuase regexp is a term query and not a fulltext query. You are probably using a whitespace tokenizer and then you wont ever find a token containg whitespace
Upvotes: 0