Solr score is not ordering results by match percent

Question

I am using solr to search a list of names and using ngrams to account for partial string matching. If I have the names "Rose", "Rosen", "Rosenberg", and "Rosenthal" I would expect a query of "Rose" to return:

Rose
Rosen
Rosenberg
Rosenthal

But what I get is:

Rosenberg
Rosenthal
Rose
Rosen

With all results having the same score. Ive tried creating an exact match field and an ngrams field but that doesn't give me what I either. When i search "Rose" I get:

Rose
Rosenberg
Rosenthal
Rosen

With only the exact match having a higher score and all others still the same regardless of match percent. If I want to order the results by match percent, and secondarily by alphabetical order, how would I do that?

MatsLindh · Accepted Answer

The reason why you don't see a change is because they all match the same token, and the score is calculated based on which tokens are in the index.

A token is a "form" of the word, the ngram filter will generate multiple tokens from a word, such as ro, ros and rose. As all the words matches the same token, rose, they get the same score.

A way to solve this is to have two fields - one for the exact match and one for the ngram fields, then weigh these fields differently in qf (if using (e)dismax). That way an exact hit will contribute more to the score.

Your first example would be achieved by ordering by the alphabetical sort order by itself (as all the words would have the same prefix, that might be what you want).

If you want to sort by token length (if this is a field with a single value), there is no function in Solr to retrieve the actual length of the indexed value at the moment, so you'd have to index a value together with the field containing the length of the indexed content, then sort by that as well - that way you'd get shorter matches first.

For example, if your field is name, you could add a field name_length as an integer, then add this field to your document when doing an add:

document.addField("name", name);
document.addField("name_length", name.length()); // or len(name) in python, etc.

Exactly how you do that depends on how you're indexing the content. You can also do it in an update chain in Solr, for example by using Javascript in a StatelessScriptUpdateProcessor. The manual method might be quicker and easier to implement, but an update chain would be available regardless of where the indexing operation is coming from (so if you're indexing from many locations / code bases, etc., it might be useful to evaluate).

Solr score is not ordering results by match percent

Answers (2)

A way to solve this is to have two fields - one for the exact match and one for the ngram fields, then weigh these fields differently in qf (if using (e)dismax). That way an exact hit will contribute more to the score.

Related Questions