Elasticsearch standard analyser stopwords

Question

I am trying to guess what is the default stopwords list in standard analyzer in elasticsearch. I run version 1.3.1, and it seems to me that the English list is used, because running a wildcard query like this

{
      "wildcard" : {
        "name" : {
          "wildcard" : "*in*"
        }
      }
}

Gives me no results (I sure have documents names containing "in", and they are returned when using not_analyzed mapping). However, on the 1.0 breaking changes they say the default is now Empty, and the same is stated in the Standard Analyzer documentation for the latest version. On the other hand, when clicking on the given link for more details, i end up to the Stop Analyzer documentation, saying that the default is still English.

Any Help? Thanks

Andrei Stefan · Accepted Answer

This would be the list of stopwords for the standard analyzer: http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-analyzers-common/4.9.0/org/apache/lucene/analysis/core/StopAnalyzer.java?av=f#50

50   static {
51     final List stopWords = Arrays.asList(
52       "a", "an", "and", "are", "as", "at", "be", "but", "by",
53       "for", "if", "in", "into", "is", "it",
54       "no", "not", "of", "on", "or", "such",
55       "that", "the", "their", "then", "there", "these",
56       "they", "this", "to", "was", "will", "with"
57     );
58     final CharArraySet stopSet = new CharArraySet(Version.LUCENE_CURRENT, 
59         stopWords, false);
60     ENGLISH_STOP_WORDS_SET = CharArraySet.unmodifiableSet(stopSet); 
61   }

Elasticsearch source code for standard: https://github.com/elastic/elasticsearch/blob/v1.3.1/src/main/java/org/elasticsearch/index/analysis/StandardAnalyzerProvider.java#L47

Which links to Lucene's StandardAnalyzer, which in turn references StopAnalyzer's stopwords list: http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-analyzers-common/4.9.0/org/apache/lucene/analysis/standard/StandardAnalyzer.java?av=f#63

Elasticsearch standard analyser stopwords

Answers (1)

Related Questions