r.r
r.r

Reputation: 255

Solr Dismax and Edismax request gives different results for the same query

There is query that contains optional("should" clauses) mandatory and prohibited tokens. The following two queries returns different results. But should be same, not?

+_query_:"{!type=**dismax** mm='2<2 3<3 5<4 7<51%' qf='normalizedField'} opt1 opt2 +mandatory -prohibited"

VS

+_query_:"{!type=**edismax** mm='2<2 3<3 5<4 7<51%' qf='normalizedField'} opt1 opt2 +mandatory -prohibited"

With Minimum "Should" Match parameter:

mm: "2<2 3<3 5<4 7<51%"

Any ideas? Thanks

Updated There is document in solr index:

{
   ...
   "normalizedField":"opt1 opt3 mandatory"
   ...
}

searching with dismax query:

+_query_:"{!type=dismax mm='2<2 3<3 5<4 7<51%' qf='normalizedField'} opt1 opt2 +mandatory -prohibited"

"parsedquery_toString":"+(((normalizedField:opt1) (normalizedField:opt2) +(normalizedField:mandatory) -(normalizedField:prohibited))~2) ()"

return empty result(as expected)

BUT

searching with edismax query:

+_query_:"{!type=edismax mm='2<2 3<3 5<4 7<51%' qf='normalizedField'} opt1 opt2 +mandatory -prohibited"

"parsedquery_toString": "+((normalizedField:opt1) (normalizedField:opt2) +(normalizedField:mandatory) -(normalizedField:prohibited))"

return this document. WHY?

Upvotes: 1

Views: 1357

Answers (2)

r.r
r.r

Reputation: 255

seems i found solution. I USED 5.2 solr version with known issue(https://issues.apache.org/jira/browse/SOLR-2649). After upgrade to version 5.5.1 issue is resolved) and edismax works the same as dismax(for my example)

Upvotes: 2

MatsLindh
MatsLindh

Reputation: 52792

edismax and dismax are not identical (there wouldn't be any reason for introducing edismax in that case). edismax extends the syntax set and magic of dismax, by introducing several new features:

  • supports the full Lucene query parser syntax.
  • supports queries such as AND, OR, NOT, -, and +.
  • treats "and" and "or" as "AND" and "OR" in Lucene syntax mode.
  • respects the 'magic field' names _val_ and _query_. These are not a real fields in the Schema, but if used it helps do special things (like a function query in the case of _val_ or a nested query in the case of _query_). If _val_ is used in a term or phrase query, the value is parsed as a function.
  • includes improved smart partial escaping in the case of syntax errors; fielded queries, +/-, and phrase queries are still supported in this mode.
  • improves proximity boosting by using word shingles; you do not need the query to match all words in the document before proximity boosting is applied.
  • includes advanced stopword handling: stopwords are not required in the mandatory part of the query but are still used in the proximity boosting part. If a query consists of all stopwords, such as "to be or not to be", then all words are required.
  • includes improved boost function: in Extended DisMax, the boost function is a multiplier rather than an addend, improving your boost results; the additive boost functions of DisMax (bf and bq) are also supported.
  • supports pure negative nested queries: queries such as +foo (-foo) will match all documents.
  • lets you specify which fields the end user is allowed to query, and to disallow direct fielded searches.

I've bolded the ones that easily might affect scoring, while features such as "pure negative nested queries" will change which documents are included. The same can occur because of support of the full lucene query parser syntax.

The easiest way to actually find out what's happening is to use the debugQuery feature of Solr, so you can see the scores and exactly what the dismax and edismax query is expanded to.

.. and if dismax works, you can just use that.

Upvotes: 0

Related Questions