Reputation: 7484
I am following this tutorial on elastic search.
Two employees have 'about' value as:
"about": "I love to go rock climbing"
"about": "I like to collect rock albums"
I run following query:
GET /megacorp/employee/_search {"query":{"match":{"about":"rock coll"}}}
Both above entries are returned, but surprisingly wit same score:
"_score": 0.2876821
Shouldn't the second one must have higher score as it has 'about' value containing both 'rock' and 'coll' while first one only contains 'rock'?
Upvotes: 1
Views: 330
Reputation: 3036
Elasticsearch
analyzes each text
field before storing it. The default analyzer (standard analyzer) splits the text based on whitespaces and lowercases it. The output of analysis process is a list of tokens which are used to match your query tokens. If any of the tokens match exactly the relevant document is returned. That's being said, your second document doesn't contain the token col
and that's why you are having the same score for both documents.
Even if you build your custom analyzer and use stemming, the word collect
won't be stemmed as coll
.
You can build custom analyzers in which you can specify that tokens should be of length 1 character, then Elasticsearch
will consider each single character as a token and you can search for the existence of any character in your documents.
Upvotes: 0
Reputation: 92854
When we search on a full-text field, we need to pass the query string through the same analysis process as we have when we index a document, to ensure that we are searching for terms in the same form as those that exist in the index.
Analysis process usually consists of normalization and tokenization (the string is tokenized into individual terms by a tokenizer).
As for match Query:
If you run a match query against a full-text field, it will analyze the query string by using the correct analyzer for that field before executing the search. It just looks for the words that are specified.
So, in your match query Elasticsearch will look for occurrences of the whole separate words: rock
or/and coll
.
Your 2nd document doesn't contain a separate word coll
but was matched by the word rock
.
Conclusion: the 2 documents are equivalent in their _score
value (they were matched by the same word rock
)
Upvotes: 1
Reputation: 615
That totally depends on what analyzer you are using. if you are using standard or english analyzer this result is correct. I recommend you to spend some time working with elasticsearch's Analyze API to get familiar how each analyzer affect your text.
By the way, if you want second document to have higher score, take a look at Partial matching.
Upvotes: 2