Rick Hodder
Rick Hodder

Reputation: 2252

SOLR Score Range Changed

I am migrating from SOLR 4.10.2 to SOLR 7.1.

All seems to be going well, except for one thing: the score that is coming back for the resulting documents is giving different scores.

The core uses a schema. Here's the schema info for the field that i am searching on:

<field name="IDX_Company" type="text_general" indexed="true" stored="false" multiValued="true" />
<field name="Company" type="string" indexed="true" stored="true"/>
<copyField source="Company" dest="IDX_Company"/>

When searching maxrows=750, fields: *,score

IDX_Company:(cat and scratch)

SOLR 7.1: max score 6.95 and a min of 6.28

SOLR 4.10.2: max score 8.63 and a min of 0.91

IDX_InsuredName:(cat and scratch and fever)

SOLR 7.1 max score of 12.99 and a min of 11.25 SOLR 4.10.2 max 3.97 and min of 0.77

See how the range of values is different (ranges in 7.1 dont go down to 0.x) Also notice that the max score doubles when I add one word to the search terms in 7.1. Most important, the ranges in 4.10.2 overlap - but the 7.1 dont.

A little more information to show you how I use this information, and why this is causing a problem.

I get a company name like "bobs cabinetry" and another "all american tech enterprise"

I run two SOLR queries per company name, I'll call them 1-AND, 1-OR, 2-AND, 2-OR.

IDX_Company:(bobs AND cabinetry) &f=*,score,requestid:"1-AND"
IDX_Company:(bobs OR cabinetry) &f=*,score,requestid:"1-OR"
IDX_Company:(all AND american AND tech AND enterprise) &f=*,score,requestid:"2-AND"
IDX_Company:(all OR american OR tech OR enterprise) &f=*,score,requestid:"2-OR"

I combine the results together sort by descending score, and then take the top 750 rows.(The requestid lets me know which query the results came from)

Because of the changes in the range of scores, the sort pushes all of the all american tech enterprise rows to the top of the results (because of no overlap), and when the top 750 are taken everything for bobs carpentry is removed from the results.

Is there some config setting I can change to make score calculation act like it did in 4.10.2?

Or something else?

Upvotes: 0

Views: 390

Answers (1)

Persimmonium
Persimmonium

Reputation: 15791

for starters, similarity changed to BM25 in Solr6, so this already should be something to do. If you want to get scores resembling 4.x as much as I possible, I would:

  1. use tdidf similarity, see here
  2. go over release notes, and see if some other default changed that has some effect in scores. Use debug and explain parameters in the request to get details on how the result is being calculated

Upvotes: 1

Related Questions