Reputation: 2489
I'm trying to apply more than one filter on the TokenStream in my customized analyzer. Following is the code:
public class CustomizeAnalyzer extends Analyzer {
//code omitted
@Override
protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
Tokenizer source = new LetterTokenizer(Version.LUCENE_44, reader);
TokenStream filter = new LowerCaseFilter(Version.LUCENE_44, source);
filter = new StopFilter(Version.LUCENE_44, filter, stopWords);
return new TokenStreamComponents(source, new PorterStemFilter(source));
}
}
However, the LowerCaseFilter won't be used. I literally follow the documentation here. Can someone please explain me how to make it work?
Many thanks,
Upvotes: 0
Views: 3926
Reputation: 33351
Your problem is in the last line. You create a chain of filters, and then short circuit it in the return statement by passing back new PorterStemFilter(source)
, which is a stem filter sitting directly on the tokenizer, rather than the filters earlier in the chain. This should be:
@Override
protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
Tokenizer source = new LetterTokenizer(Version.LUCENE_44, reader);
TokenStream filter = new LowerCaseFilter(Version.LUCENE_44, source);
filter = new StopFilter(Version.LUCENE_44, filter, stopWords);
filter = new PorterStemFilter(filter);
return new TokenStreamComponents(source, filter);
}
Upvotes: 7