Reputation: 28012
Suppose I store a set of strings (each document in Lucene would be a single word), and then given an input word W, I would like to retrieve all the document that not only match word W but also those documents whose stemmed version also matches W.
Also, suppose a input a word W, I would want to take care of the case where there is a document that matches the stemmed version of the word W as well.
Would writing my own custom analyzer and returning a PorterStemFilter suffice? Do I need to just write this class and reference it as the analyzer in the code?
Upvotes: 1
Views: 1264
Reputation: 5487
Writing a custom Analyzer that has a stemmer in the analyzer chain should suffice.
Here is the sample code that uses PorterStemFilter in Lucene 4.1
class MyAnalyzer extends Analyzer {
@Override
protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
Tokenizer source = new LowerCaseTokenizer(version, reader);
return new TokenStreamComponents(source, new PorterStemFilter(source));
}
}
Please note that you MUST use the same custom Analyzer while querying which is used for indexing as well.
You may find the sample code for your version of Lucene in the corresponding PorterStemFilter documentation.
Upvotes: 2