Reputation: 1836

Solr - sort results by maximum matches for OR search on multi-valued field

Let me try to explain my problem, let's assume I have a multi-valued field called "enrolment" in each document that contains name of students in it.

Now while searching Solr, let's say I fire search for the names of three students - Manish, Amit, Navin. Now Solr returns all documents containing any one of these names (which is obviously desired in my case). Now some documents may have all 3 of them, or 2 of them or 1 of them. I want these documents/results sorted in an order such that document with maximum matching will be at the top, followed by lesser number of matches.

I tried adding sort: score desc for this, but it doesn't work as desired because the score is "1" for all matching documents.

How can I achieve the sort order by maximum number of matches for my multi-valued field?

Upvotes: 0

Answers (1)

MatsLindh

Reputation: 52912

Given a multivalued integer field where you want to rank the documents based on the number of matches, apply a boost query for each match. For example, if you have a series of monitors that come in different sizes, you can apply a boost for each size that is valid (I hacked this together and tested it with the example docs from the tech core, so that's my example and I'm sticking with it). I have two relevant documents, one named VA902B with sizes given as a multi valued field with values 23, 28, and 32, and one named 3007WFP with values 23, 29, 36 in the same field.

Here I'm asking for any document, but give me those that have both size 28 and size 23 at the top, and then those that have either size 28 or size 23, and then any other document:

?bq=sizes:28&bq=sizes:23&defType=edismax&q=*:*

If I want to limit the set of documents to only those that match either of the sizes, I can use that as my main query:

?defType=edismax&q=sizes:(23%2028)

.. and this is where I discover that your presumption that the score is the same regardless of the number of matches is false. Adding &debugQuery=true to the URL gives us detailed scoring information for each document:

"explain": {
  "VA902B": "\n2.0 = sum of:\n  1.0 = sizes:[23 TO 23]\n  1.0 = sizes:[28 TO 28]\n",
  "3007WFP": "\n1.0 = sum of:\n  1.0 = sizes:[23 TO 23]\n"
},

.. which means that there is no need for applying a boost - the behaviour you want is the standard behaviour for Solr. This was my initial thought, but that should have given you the correct answer with the queries you gave in the comments.

But I'll show you how my strategy with applying boosts would have worked as well:

?bq=sizes:28&bq=sizes:23&defType=edismax&q=sizes:(23%2028)&debugQuery=true

.. which now tells us that the score for each document has effectively doubled, since it gets scored 1.0 (from the query) + 1.0 (from the boost) for each match.

"explain": {
  "VA902B": "\n4.0 = sum of:\n  2.0 = sum of:\n    1.0 = sizes:[23 TO 23]\n    1.0 = sizes:[28 TO 28]\n  1.0 = sizes:[28 TO 28]\n  1.0 = sizes:[23 TO 23]\n",
  "3007WFP": "\n2.0 = sum of:\n  1.0 = sum of:\n    1.0 = sizes:[23 TO 23]\n  1.0 = sizes:[23 TO 23]\n"
},

I also tested the q=sizes(23 28) query with the standard lucene query parser (and not dismax/edismax which support bq), and the behaviour was the same.

Upvotes: 1

Solr - sort results by maximum matches for OR search on multi-valued field

Answers (1)

Related Questions