Tobb
Tobb

Reputation: 12180

Solr 5 - disable idf scoring

I am using SOLR 5.5.0, and noticed unwanted behaviour with regards to scoring.

The search index is for persons, with fields for givenName and surName. I have weighted givenName a bit higher than surName, but for some queries, the hits from surName are weighted higher than the hits from givenName. This is due to idf-weighting.

As an example, consider the search string "James". With regards to my weighting of givenName higher than surName, I would expect the hits with givenName "James" to be on the top of the result, the ones with surName "James" ranked lower. But, if there is 1000 people with givenName "James", and only ten with surName "James", the latter group will be given the highest score due to idf.

Is there a way to disable idf in solr? All I can find is something about overriding DefaultSimilarity, but I don't get how I can do that with my xml configuration, also the class is deprecated in lucene-5.5.0.

Upvotes: 0

Views: 674

Answers (1)

Peter Dixon-Moses
Peter Dixon-Moses

Reputation: 3209

You probably don't really want to disable idf as then searches for [James Garfield] won't recognize that "Garfield" is rarer than "James" and therefore should score higher alone (than James alone).

I think what you are asking for is for combined idf between two fields. And the easiest way to accomplish that is to create a third fullName field for searching.

In your example, the document-frequency for "James" in fullName would be 1010, and the match would score equally regardless of givenName=James vs surName=James.

Upvotes: 1

Related Questions