jiffybank
jiffybank

Reputation: 83

Solr: Filtering on the number of matches in an OR query to a multivalued field

Given the following example solr documents:

<doc>
  <field name="guid">1</field>
  <field name="name">Harry Potter</field>
  <field name="friends">ron</field>
  <field name="friends">hermione</field>
  <field name="friends">ginny</field>
  <field name="friends">dumbledore</field>
</doc>
<doc>
  <field name="guid">2</field>
  <field name="name">Ron Weasley</field>
  <field name="friends">harry</field>
  <field name="friends">hermione</field>
  <field name="friends">lavender</field>
</doc>
<doc>
  <field name="guid">3</field>
  <field name="name">Hermione Granger</field>
  <field name="friends">harry</field>
  <field name="friends">ron</field>
  <field name="friends">ginny</field>
  <field name="friends">dumbledore</field>
</doc>

and the following query (or filter query):

friends:ron OR friends:hermione OR friends:ginny OR friends:dumbledore 

all three documents will be returned since they each have at least one of the specified friends.

However, I'd like to set a minimum (and maximum) threshold for how many friends are matched. For example, only return documents that have at least 2 but no more than 3 of the specified friends.

Such a query would only return the third document (Hermione Granger) as she has 3 of the 4 friends specified, while the first (Harry Potter) matches all 4 and the second (Ron Weasley) matches only 1.

Is this possible in a Solr query?

Upvotes: 8

Views: 2484

Answers (2)

jbnunn
jbnunn

Reputation: 6355

You'll want to use a function query, termfreq, and count the number of terms (aka "friends" in your case) matched. You can sum up the results, then only return documents within your threshold, using frange, like this:

{!frange l=2 u=3}sum(termfreq(friends,'ron'),termfreq(friends,'hermione'),termfreq(friends,'ginny'),termfreq(friends,'dumbledore'))

termfreq(...) will return 1 for each friend found, and the sum of those is what you test against your threshold (the lower and upper bounds you specified in the beginning of your !frange statement).

You can place this in the q: field or fq: field. Here it is in the Solr admin panel for your reference:

enter image description here

Upvotes: 11

Persimmonium
Persimmonium

Reputation: 15771

the easiest way I can see is just adding a 'nbOfFriends' field and populate it in the source or in a UpdateProcessor.

If you don't want to add this additional field, you might look at Joins, but I am not sure if it allows you to query number of children in a join, you should check.

Upvotes: 0

Related Questions