Reputation: 11
I have two major questions about the Lucene Demo. Does the Lucene demo use stopwords before any modification? What about stemming? If so, what stemmer does it use?
Upvotes: 0
Views: 509
Reputation: 22042
Which demo are you referring to?
If it's this one, then the answers are:
(a) Stop words: no, it does not. It uses the StandardAnalyzer()
which does not use stop words when created with no arguments (but it can, if you choose to provide some).
(b) Stemming: no it does not use stemming - there are no stemming classes involved in the demo code, because there is no stemming used by the standard analyzer.
Take a look at the javadoc for the StandardAnalyzer
. You will see the following:
Filters StandardTokenizer with LowerCaseFilter and StopFilter, using a configurable list of stop words.
So, this tells you how your input documents are analyzed:
Using the StanadardTokenizer
, the rules for which you can read about here.
Using the LowerCaseFilter
- which works like you would expect.
Using the StopFilter
- for which you may or may not have provided any stop words.
Upvotes: 0