Reputation: 2011
I am trying to guess what is the default stopwords list in standard analyzer in elasticsearch. I run version 1.3.1, and it seems to me that the English list is used, because running a wildcard query like this
{
"wildcard" : {
"name" : {
"wildcard" : "*in*"
}
}
}
Gives me no results (I sure have documents names containing "in", and they are returned when using not_analyzed mapping). However, on the 1.0 breaking changes they say the default is now Empty, and the same is stated in the Standard Analyzer documentation for the latest version. On the other hand, when clicking on the given link for more details, i end up to the Stop Analyzer documentation, saying that the default is still English.
Any Help? Thanks
Upvotes: 1
Views: 451
Reputation: 52368
This would be the list of stopwords for the standard analyzer: http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-analyzers-common/4.9.0/org/apache/lucene/analysis/core/StopAnalyzer.java?av=f#50
50 static {
51 final List<String> stopWords = Arrays.asList(
52 "a", "an", "and", "are", "as", "at", "be", "but", "by",
53 "for", "if", "in", "into", "is", "it",
54 "no", "not", "of", "on", "or", "such",
55 "that", "the", "their", "then", "there", "these",
56 "they", "this", "to", "was", "will", "with"
57 );
58 final CharArraySet stopSet = new CharArraySet(Version.LUCENE_CURRENT,
59 stopWords, false);
60 ENGLISH_STOP_WORDS_SET = CharArraySet.unmodifiableSet(stopSet);
61 }
Elasticsearch source code for standard
: https://github.com/elastic/elasticsearch/blob/v1.3.1/src/main/java/org/elasticsearch/index/analysis/StandardAnalyzerProvider.java#L47
Which links to Lucene's StandardAnalyzer
, which in turn references StopAnalyzer
's stopwords list: http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-analyzers-common/4.9.0/org/apache/lucene/analysis/standard/StandardAnalyzer.java?av=f#63
Upvotes: 2