user1225072
user1225072

Reputation: 345

does lucene standardanalyzer remove stopwords and have stemming function?

i have tested standardanalyzer with indexWriter and found that it automatically removes stopwords, however, i did not add stopwords list as following code is what i used

StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_35); 
        IndexWriterConfig config =new IndexWriterConfig(Version.LUCENE_35, analyzer);

where is default stopwords list? also, does this analyzer automatically stem words too??

Upvotes: 3

Views: 4142

Answers (1)

Michał Kosmulski
Michał Kosmulski

Reputation: 10020

According to the API docs, there exists a default set of stopwords (taken from English language), stored in StandardAnalyzer.STOP_WORDS_SET. It is used if you create the analyzer with the constructor public StandardAnalyzer(Version matchVersion), which is exactly what you do. The set is exactly the same as StopAnalyzer.ENGLISH_STOP_WORDS_SET. You can use one of the other constructors to pass the analyzer another (possibly empty) set of stopwords.

StandardAnalyzer doesn't stem words. If you need stemming, use for example SnowballAnalyzer.

Upvotes: 4

Related Questions