Denis Kuznetsov
Denis Kuznetsov

Reputation: 382

Solr partial search, strange behaviour

I have strange behaviour of Solr partial search. I use this filter:

<filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="20" />

I used different versions: Solr 4 and Solr 5. And I have next results for matching: 1) Word: Notifications

Not - YES
Noti - YES
Notif - YES
Notifi - NO
Notific - YES
Notifica - NO
Notificat - NO
Notificati - NO
Notificatio - NO
Notification - YES
Notifications - YES

2) Two words: A - Multiplication and B - Multiplicatination (with mistake)

Mul: A - YES, B - YES
Mult: A - YES, B - YES
Multi: A - YES, B - YES
Multip: A - YES, B - YES
Multipl: A - YES, B - YES
Multipli: A - NO, B - YES
Multiplic: A - YES, B - YES
Multiplica: A - NO, B - YES
Multiplicat: A - NO, B - YES
Multiplicati: A - NO, B - YES
Multiplicatin: A - NO, B - YES
Multiplicatina: A - NO, B - NO
Multiplicatinat: A - NO, B - NO
Multiplicatinati: A - NO, B - NO
Multiplicatinatio: A - NO, B - NO
Multiplicatination: A - NO, B - YES
Multiplicatio: A - NO, B - NO
Multiplication: A - YES, B - YES (!!!)

Why does it work with so strange way? How I can fix it?

Why does "Notific" match to "Notifications", but "Notifi", "Notifica" and "Notificatio" doesn't? Why does "Multiplica" match to "Multiplicatination", but doesn't to "Multiplication"? Why does "Multiplication" match to "Multiplicatination"? How does it work?

I run next query (I fetched it from debugger):

/select?q="multiplic"&fq=(ss_search_api_datasource%3A"entity%3Anode"+ss_media_bundle%3A"document")&fq=(ss_search_api_datasource%3A"entity%3Amedia"+ss_node_bundle%3A"task"+ss_node_bundle%3A"supply"+ss_node_bundle%3A"store"+ss_node_bundle%3A"news"+ss_node_bundle%3A"faq")&fq=index_id%3A"search"&fq=hash%3A"8qk984"&rows=3&fl=ss_search_api_id%2Cscore&wt=json&indent=true&defType=edismax&qf=tm_attachment_file%5E1+ts_media_name%5E8+ts_media_file_name%5E2+ts_node_title%5E13+ts_node_body%5E3&stopwords=true&lowercaseOperators=true

Used field definition from schema.xml for Solr 5:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <!-- Case insensitive stop word removal. -->
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="stopwords.txt"
            />
    <filter class="solr.WordDelimiterFilterFactory"
            protected="protwords.txt"
            generateWordParts="1"
            generateNumberParts="1"
            catenateWords="1"
            catenateNumbers="1"
            catenateAll="0"
            splitOnCaseChange="0"
            preserveOriginal="1"/>
    <filter class="solr.LengthFilterFactory" min="2" max="100" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="20" />
  </analyzer>
  <analyzer type="query">
    <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="stopwords.txt"
            />
    <filter class="solr.WordDelimiterFilterFactory"
            protected="protwords.txt"
            generateWordParts="1"
            generateNumberParts="1"
            catenateWords="0"
            catenateNumbers="0"
            catenateAll="0"
            splitOnCaseChange="0"
            preserveOriginal="1"/>
    <filter class="solr.LengthFilterFactory" min="2" max="100" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
  <analyzer type="multiterm">
    <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="stopwords.txt"
            />
    <filter class="solr.WordDelimiterFilterFactory"
            protected="protwords.txt"
            generateWordParts="1"
            generateNumberParts="1"
            catenateWords="0"
            catenateNumbers="0"
            catenateAll="0"
            splitOnCaseChange="1"
            preserveOriginal="1"/>
    <filter class="solr.LengthFilterFactory" min="2" max="100" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
</fieldType>

Upvotes: 0

Views: 98

Answers (1)

Abhijit Bashetti
Abhijit Bashetti

Reputation: 8678

I think the SnowballPorterFilterFactory is creating the problem. Can you check the same by removing it from the index analyzer.

You can find more information about it Here

Upvotes: 1

Related Questions