Reputation: 302
I have tried understanding how dismax query works and I want to validate my understanding, please see if I understood it correctly.
According to documentation a dismax query is:
A query that generates the union of documents produced by its subqueries, and that scores each document with the maximum score for that document as produced by any subquery, plus a tie breaking increment for any additional matching subqueries.
Suppose, the total documents in our ES cluster be as follows:
{"FOO":"ABC"},{"FOO":"XYZ"},{"FOO":"ABC XYZ"},{"FOO":"ABC DEF"},{"FOO":"DEF"}
and the dismax query is:
"dis_max": {
"queries": [
{
"match": {
"FOO": "ABC"
}
},
{
"match": {
"FOO": "XYZ"
}
}
]
}
}
So, as per the documentation let us first find out union of documents returned by dismax's sub-queries. The union of documents would be {"FOO":"ABC"},{"FOO":"XYZ"},{"FOO":"ABC XYZ"},{"FOO":"ABC DEF"}
. According to the next step we need to score each document with the maximum score for that document as produced by any subquery. Which will be something like:
{"FOO":"ABC"}
will be scored on {"match":{"FOO": "ABC"}}
and {"match":{"FOO": "XYZ"}}
and the maximum score returned will be used.
And similarly, {"FOO":"XYZ"}
will be scored on {"match":{"FOO": "ABC"}}
and {"match":{"FOO": "XYZ"}}
and the maximum score returned will be used and this will be done for all the union of documents and finally the documents will be returned in a sorted way.
Is this how dismax query works? Or did I misunderstand or miss out anything?
Upvotes: 3
Views: 1305