Reputation: 27727
Is there a Lucene analyzer out there that tokenizes name parts with their short name equivalents (e.g. Mike and Michael, Rich and Richard, Suzie and Susan), etc?
Fuzzy match on Levenshtein distance is a solution I know, and some implementors seem to pair fuzzy match with the soundex algorithm. Surely somebody has made a swipe at just plain listing all of these short names somewhere?
EDIT: The toughest part of this question is where to get the synonym data from?
Upvotes: 3
Views: 601
Reputation: 33341
I am not aware of any specific nickname filter out there.
A SynonymFilter would make it reasonably easy to generate though, if you had a data source for it. This appears to be a good source of nickname data:
https://code.google.com/p/nickname-and-diminutive-names-lookup/
You would need to generate the SynonymMap
to pass into the SynonymFilter
ctor, which should look something like this (I think):
SynonymMap.Builder builder = new SynonymMap.Builder(true);
builder.add(new CharsRef("Mike"), new CharsRef("Michael"), false);
builder.add(new CharsRef("Rich"), new CharsRef("Richard"), false);
builder.add(new CharsRef("Suzie"), new CharsRef("Susan"), false);
SynonymMap map = builder.build();
Upvotes: 5