Reputation: 345
i have tested standardanalyzer with indexWriter and found that it automatically removes stopwords, however, i did not add stopwords list as following code is what i used
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);
IndexWriterConfig config =new IndexWriterConfig(Version.LUCENE_35, analyzer);
where is default stopwords list? also, does this analyzer automatically stem words too??
Upvotes: 3
Views: 4142
Reputation: 10020
According to the API docs, there exists a default set of stopwords (taken from English language), stored in StandardAnalyzer.STOP_WORDS_SET
. It is used if you create the analyzer with the constructor public StandardAnalyzer(Version matchVersion)
, which is exactly what you do. The set is exactly the same as StopAnalyzer.ENGLISH_STOP_WORDS_SET
. You can use one of the other constructors to pass the analyzer another (possibly empty) set of stopwords.
StandardAnalyzer
doesn't stem words. If you need stemming, use for example SnowballAnalyzer
.
Upvotes: 4