Lucene custom similarity/scoring

Question

I'm looking out for a similarity module in Lucene (Java) that gives a weightage based score. I know this is vague, better to explain with an example.

Document 1
-----------
Firstname: Francesca

Document 2
-----------
Firstname: Francisco

The Firstname field is analysed using Doublemetaphone & Refined Soundex phonetic algorithms. during indexing stage.

Therefore, the inverted index looks like this (The last two terms are given by Doublemetaphone and REfined Soundex respectively):

francesca ===> Doc1
francisco ===> Doc2
FRNS   ===> Doc1, Doc2
F29083030 ===> Doc1
F2908306 ===> Doc2

Now my search query looks like this: Firstname: "francesca"

Obviously, For Doc1, all the 4 terms match. For each match, I want to give a percentage of 25% (I know in advance that there can only be a max of 4 expanded terms for a given term.

Going by this principle, I want to give the following score:

Doc1 (100)  [Reason: All 4 terms match]
Doc2 (25)  [Reason: Only FRNS term matches, rest don't match]

Now my question here is, to achieve this, is there any similarity module available out of the shelf? If not, I believe I should extend the DefaultSimilarity and override the necessary methods. But where is the module that calls the similarity module and sums up all the scores per document? The reason I ask is I will extend this weightage based scoring for other fields too in which case, the total score per document will be the sum of weighted average of individual fields. Therefore, I should also customise the code that sums up the scores of individual fields and override it to find the average. Can someone show some pointers please? Thanks.

Lucene custom similarity/scoring

Answers (1)

Related Questions