MTA
MTA

Reputation: 1073

Solr: Is re-indexing is must for stop-words?

Does Solr 4.10.3 eliminates stop-words from query phrase if we add the stop-words in the stopwords.txt file without re-indexing the documents? Or documents re-indexing is must?

Because i added the stopwords (without re-indexing the documents) and solr still gives me result without eliminating the stopwords.

I've restarted the solr after adding the list in stopwords.txt file

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
<similarity class="solr.DFRSimilarityFactory">
        <str name="basicModel">I(F)</str>
        <str name="afterEffect">B</str>
        <str name="normalization">H2</str>
    </similarity>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <!-- in this example, we will only use synonyms at query time
                 <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

Upvotes: 0

Views: 958

Answers (1)

Vinod
Vinod

Reputation: 1953

consider query q=Iron man of India

if you use stopwords in your query analyzer and say word of is included in stopword list. solr will separate tokens as below

Iron, man, of, India 

Since you used stopwords filter, it will discard word "of" and it will search for documents which has tokens(Iron, man, India). results documents Score depends on various factors like how many tokens present in doc, how times it is present(tf-IDF score)

It is same when you use stopwords during indexing. it will index tokens (Iron, man, India) it will not index (of).

Upvotes: 2

Related Questions