Jonathan Schneider
Jonathan Schneider

Reputation: 27727

Lucene analyzer for first name

Is there a Lucene analyzer out there that tokenizes name parts with their short name equivalents (e.g. Mike and Michael, Rich and Richard, Suzie and Susan), etc?

Fuzzy match on Levenshtein distance is a solution I know, and some implementors seem to pair fuzzy match with the soundex algorithm. Surely somebody has made a swipe at just plain listing all of these short names somewhere?

EDIT: The toughest part of this question is where to get the synonym data from?

Upvotes: 3

Views: 601

Answers (1)

femtoRgon
femtoRgon

Reputation: 33341

I am not aware of any specific nickname filter out there.

A SynonymFilter would make it reasonably easy to generate though, if you had a data source for it. This appears to be a good source of nickname data:

https://code.google.com/p/nickname-and-diminutive-names-lookup/

You would need to generate the SynonymMap to pass into the SynonymFilter ctor, which should look something like this (I think):

SynonymMap.Builder builder = new SynonymMap.Builder(true);
builder.add(new CharsRef("Mike"), new CharsRef("Michael"), false);
builder.add(new CharsRef("Rich"), new CharsRef("Richard"), false);
builder.add(new CharsRef("Suzie"), new CharsRef("Susan"), false);
SynonymMap map = builder.build();

Upvotes: 5

Related Questions