Reputation: 1482
I am searching a word in lucene based search and i want to convert string like 'eating', 'eats' to 'eat' in java. I searched and found the lemmatization as the solution, but all the English lemmatizer tools that I have come across uses wordlist or dictionary-lookup. Is there any lemmatizer which avoids dictionary lookup and gives high efficiency, may be a lemmatizer that is based on rules. Yes and I am not looking for "stemmer". or Is there any way (not important ready to use library, any algorithm, approach etc.) to get root / original word.
Upvotes: 2
Views: 392
Reputation: 3112
There is no rule based lemmatizer tools for English, because for a lot words no possible to construct regular rules, e.g. all irregular verbs or some plurals nouns like child/children or men/man. If you looking for height effective solution I can recommend to look at project English/Russian morphology for Lucene. It has speed about 800 000 words per second, consumes small amount of memory - several megabytes and provides some heuristic for normalization of unknown words.
Upvotes: 1