Reputation: 2279
What I want is for results containing 'car' to also show up if I search for 'vehicle' and other such synonyms in the English language.
I know Solr has SynonymFilterFactory, which is empty by default. But I am curious if there is a standard way to normalize all words for the English language. Should I generate synonyms.txt from a thesaurus?
Is doing this standard practice or is there a better way to handle this?
Upvotes: 2
Views: 746
Reputation: 3209
Take a look at WordNet. It's a standard English thesaurus package. It's included in Python's NLTK package, and it shouldn't be a lot of work to write a script that dumps it out in the format required by SynonymFilterFactory.
But to @jay 's point, you're going to get a lot of hits you probably don't want. Spending some time to customize your thesaurus to your domain will pay dividends!
Upvotes: 2