Reputation: 41
I am new to Solr and studying the basic scoring model. I understand that basic scoring model employs Boolean to generate the document set and then uses the vector space model to score for ranking according to relevance. What I want to know that while using Proximity searches, do the search results also gets ranked according to the vector space model after generation OR are they just scored based on the edit distance?
Upvotes: 3
Views: 486
Reputation: 9320
First of all, VSM score is used in org.apache.lucene.search.similarities.TFIDFSimilarity
(keep in mind, it's not a default Similarity in the recent versions of Lucene). For example, org.apache.lucene.search.similarities.BM25Similarity
implements something similar, but rather called bag of words.
In case of proximity searches, the base class org.apache.lucene.search.similarities.Similarity has a nested class
Similarity.SimScorer
which is responsible for scoring "sloppy" queries such as SpanQuery
, and PhraseQuery
. Usually, there is a method calculating sloppyFreq
, which is a function of edit distance and it's added as an additional coefficient in formula.
One of the default implementations of the sloppyFreq
is 1.0f / (distance + 1)
, but of course it could be customized, depending on your needs.
Upvotes: 2