Reputation: 141
I am trying to implement morphological search using Solr.
Here's a quick intro to morpholgical search: It means that the search algorithm considers all grammar forms of words when creating the search index and searching for the requested phrases.
For example, when indexing the word child, the system adds both child and children to the index. Similar rule applies to verbs: for bring, the system adds bringing, brought etc. Consequently, if a user searches for a phrase "children bring", the system will display all results containing child, children, bring, bringing, brought etc.
Here are my two options:
1) Lemmatize each token and use that at index time as well as do the same with the query string at search time.
I DON't WANT to use this approach since this would make my index inconsistent when I start supporting morphpological search, since the previous documents will lack the lemma tokens. I don't want to reindex either.
2) Only at query time, find all variants of the lemma (eg: lemma of 'brought' is 'bring')and generate these as additional tokens through my Token Filter. This would serve a morphological search without having to index/reindex anything.
Question:
Are there any good Java libraries which would give me variants/inflections of a lemma (or the root word. eg: lemma of 'brought' is 'bring') ?
Upvotes: 0
Views: 225
Reputation: 116
Something near to your requirement is using solr synonym dictionary and synonym filter.There you can add base word like child and add variants like kid,children,baby. Collection reload would be required after editing dictionary each time. And search would be performed on every variant of child if "kid" is searched.
Upvotes: 1