Reputation: 1941
I'm looking out for a similarity module in Lucene (Java) that gives a weightage based score. I know this is vague, better to explain with an example.
Document 1
-----------
Firstname: Francesca
Document 2
-----------
Firstname: Francisco
The Firstname field is analysed using Doublemetaphone & Refined Soundex phonetic algorithms. during indexing stage.
Therefore, the inverted index looks like this (The last two terms are given by Doublemetaphone and REfined Soundex respectively):
francesca ===> Doc1
francisco ===> Doc2
FRNS ===> Doc1, Doc2
F29083030 ===> Doc1
F2908306 ===> Doc2
Now my search query looks like this: Firstname: "francesca"
Obviously, For Doc1, all the 4 terms match. For each match, I want to give a percentage of 25% (I know in advance that there can only be a max of 4 expanded terms for a given term.
Going by this principle, I want to give the following score:
Doc1 (100) [Reason: All 4 terms match]
Doc2 (25) [Reason: Only FRNS term matches, rest don't match]
Now my question here is, to achieve this, is there any similarity module available out of the shelf? If not, I believe I should extend the DefaultSimilarity and override the necessary methods. But where is the module that calls the similarity module and sums up all the scores per document? The reason I ask is I will extend this weightage based scoring for other fields too in which case, the total score per document will be the sum of weighted average of individual fields. Therefore, I should also customise the code that sums up the scores of individual fields and override it to find the average. Can someone show some pointers please? Thanks.
Upvotes: 1
Views: 1200
Reputation: 19283
A good place to start this would be Jörg Prante project - https://github.com/jprante/elasticsearch-payload
Along with other projects , he have also extended similarity module.
Further on the implementation , I would advice you to look into the type field or payload field of the token to deduce the score.
In the following file - https://github.com/jprante/elasticsearch-payload/blob/master/src/main/java/org/xbib/elasticsearch/plugin/payload/PayloadPlugin.java
You can see following code sample on how to add similarity module.
public void onModule(SimilarityModule module) {
module.addSimilarity("payload_similarity", PayloadSimilarityProvider.class);
}
Upvotes: 1