Reputation: 887
I'm using Lucene.NET 3.0.3
How can I modify the scoring of the SpellChecker (or queries in general) using a given function?
Specifically, I want the SpellChecker to score any results that are permutations of the searched word higher than the the rest of the suggestions, but I don't know where this should be done.
I would also accept an answer explaining how to do this with a normal query. I have the function, but I don't know if it would be better to make it a query or a filter or something else.
Upvotes: 2
Views: 814
Reputation: 887
From femtoRgon's advice, here's what I ended up doing:
public class PermutationDistance: SpellChecker.Net.Search.Spell.StringDistance
{
public PermutationDistance()
{
}
public float GetDistance(string target, string other)
{
LevenshteinDistance l = new LevenshteinDistance();
float distance = l.GetDistance(target, other);
distance = distance + ((1 - distance) * PermutationScore(target, other));
return distance;
}
public bool IsPermutation(string a, string b)
{
char[] ac = a.ToLower().ToCharArray();
char[] bc = b.ToLower().ToCharArray();
Array.Sort(ac);
Array.Sort(bc);
a = new string(ac);
b = new string(bc);
return a == b;
}
public float PermutationScore(string a, string b)
{
char[] ac = a.ToLower().ToCharArray();
char[] bc = b.ToLower().ToCharArray();
Array.Sort(ac);
Array.Sort(bc);
a = new string(ac);
b = new string(bc);
LevenshteinDistance l = new LevenshteinDistance();
return l.GetDistance(a, b);
}
}
Then:
_spellChecker.setStringDistance(new PermutationDistance());
List<string> suggestions = _spellChecker.SuggestSimilar(word, 10).ToList();
Upvotes: 1
Reputation: 33341
I think the best way to go about this would be to use a customized Comparator in the SpellChecker object.
Check out the source code of the default comparator here:
Pretty simple stuff, should be easy to extend if you already have the algorithm you want to use to compare the two Strings.
Then you can use set it up to use your comparator with SpellChecker.SetComparator
I think I mentioned the possiblity of using a Filter for this in a previous question to you, but I don't think that's really the right way to go, looking at it a bit more.
EDIT---
Indeed, No Comparator is available in 3.0.3, So I believe you'll need to access the scoring through the a StringDistance object. The Comparator would be nicer, since the scoring has already been applied and is passed into it to do what you please with it. Extending a StringDistance may be a bit less concrete since you will have to apply your rules as a part of the score.
You'll probably want to extend LevensteinDistance (source code), which is the default StringDistance implementation, but of course, feel free to try JaroWinklerDistance as well. Not really that familiar with the algorithm.
Primarily, you'll want to override getDistance and apply your scoring rules there, after getting a baseline distance from the standard (parent) implementation's getDistance call.
I would probably implement something like (assuming you ahve a helper method boolean isPermutation(String, String)
:
class CustomDistance() extends LevensteinDistance{
float getDistance(String target, String other) {
float distance = super.getDistance();
if (isPermutation(target, other)) {
distance = distance + (1 - distance) / 2;
}
return distance;
}
}
To calculate a score half again closer to 1 for a result that is a permuation (that is, if the default algorithm gave distance = .6, this would return distance = .8, etc.). Distances returned must be between 0 and 1. My example is just one idea of a possible scoring for it, but you will likely need to tune your algorithm somewhat. I'd be cautious about simply returning 1.0 for all permutations, since that would be certain to prefer 'isews' over 'weis' when looking with 'weiss', and it would also lose the ability to sort the closeness of different permutations ('isews' and 'wiess' would be equal matches to 'weiss' in that case).
Once you have your Custom StringDistance it can be passed to SpellChecker either through the Constructor, or with SpellChecker.setStringDistance
Upvotes: 1