Reputation: 17773
I'm using Elasticsearch 5.2
. I'm executing the below query against an index that has only one document
Query:
GET test/val/_validate/query?pretty&explain=true
{
"query": {
"bool": {
"should": {
"multi_match": {
"query": "alkis stackoverflow",
"fields": [
"name",
"job"
],
"type": "most_fields",
"operator": "AND"
}
}
}
}
}
Document:
PUT test/val/1
{
"name": "alkis stackoverflow",
"job": "developer"
}
The explanation of the query is
+(((+job:alkis +job:stackoverflow) (+name:alkis +name:stackoverflow))) #(#_type:val)
I read this as:
Field job must have alkis
and stackoverflow
AND
Field name must have alkis
and stackoverflow
This is not the case with my document though. The AND
between the two fields is actually OR
(as it seems from the result I'm getting)
When I change the type to best_fields
I get
+(((+job:alkis +job:stackoverflow) | (+name:alkis +name:stackoverflow))) #(#_type:val)
Which is the correct explanation.
Is there a bug with the validate api? Have I misunderstood something? Isn't the scoring the only difference between these two types?
Upvotes: 1
Views: 55
Reputation: 217564
Since you picked the most_fields
type with an explicit AND
operator, the reasoning is that one match query is going to be generated per field and all terms must be present in a single field for a document to match, which is your case, i.e. both terms alkis
and stackoverflow
are present in the name
field, hence why the document matches.
So in the explanation of the corresponding Lucene query, i.e.
+(((+job:alkis +job:stackoverflow) (+name:alkis +name:stackoverflow)))
when no specific operator is specified between the terms, the default one is an OR
So you need to read this as: Field job
must have both alkis
and stackoverflow
OR field name
must have both alkis
and stackoverflow
.
The AND operator that you apply only concerns all the terms in your query but in regard to a single field, it's not an AND between all fields. Said differently, your query will be executed as a two match
queries (one per field) in a bool/should
clause, like this:
{
"query": {
"bool": {
"should": [
{ "match": { "job": "alkis stackoverflow" }},
{ "match": { "name": "alkis stackoverflow" }}
]
}
}
}
In summary, the most_fields
type is most useful when querying multiple fields that contain the same text analyzed in different ways. This is not your case and you'd probably better be using cross_fields
or best_fields
depending on your use case, but certainly not most_fields
.
UPDATE
When using the best_fields
type, ES generates a dis_max
query instead of a bool/should
and the |
(which is not an OR !!) sign separates all sub-queries in a dis_max
query.
Upvotes: 1