abhinavkulkarni
abhinavkulkarni

Reputation: 2409

Stemming + stop word filtering in Lucene 4.0+

I used to use SnowBallAnalyzer to combine custom stop word filtering with basic stemming, but it has been deprecated. For e.g. in index config, I could easily specify:

IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_32,
                new SnowballAnalyzer(Version.LUCENE_32, "name", stopSet));

where stopSet is my custom list of stopwords.

How do I now create a single analyzer that would enable me filter stop words and do basic English stemming?

Thanks.

Upvotes: 0

Views: 886

Answers (1)

femtoRgon
femtoRgon

Reputation: 33351

Use EnglishAnalyzer:

new EnglishAnalyzer(Version.LUCENE_32, stopSet)

I'm a little confused on how your listed code does anything particularly useful, if you aren't passing a valid stemmer name into the SnowballAnalyzer constructor. Seems like it should throw an exception right around here:

 Class<?> stemClass = Class.forName("org.tartarus.snowball.ext." + name + "Stemmer");

Being that there is no stemmer called: "org.tartarus.snowball.ext.nameStemmer".

Upvotes: 0

Related Questions