user1396224
user1396224

Reputation: 31

Anyone has the best way for synonym search of multi keyword in solr?

I want to use synonym search in solr for multi keyword. But It doesn't work correct.

I set the synonym "multi term" for "multerm" in synonym.txt. And I expect that Solr makes query-phrase for "multerm" just like "field:"multi term"~0 but "field:multi | field:term". So It can't do intimacy search for multi term synonym.

Any one has the best way for multi term synonym search in Solr? Help me please~

Upvotes: 0

Views: 797

Answers (1)

Aujasvi Chitkara
Aujasvi Chitkara

Reputation: 939

Here is how I handle multi-word synonyms. In my schema.xml, fieldType definition looks like:

<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" tokenizer="solr.KeywordTokenizerFactory"/>

<fieldType name="custom_text_general" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
        <!-- We will use synonyms only at index time to keep querying fast-->
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" tokenizer="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" />
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
        <!-- We will use synonyms only at index time to keep querying fast
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        -->
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" />
    </analyzer>
</fieldType>

Couple of things to note:

  • I am using synonyms only at index time, to keep queries fast.
  • I added KeywordTokenizerFactory, it treats the entire field as a single token, and does not split multi-word synonyms
  • I added expand="true". If expand is true, a synonym will be expanded to all equivalent synonyms. If it is false, all equivalent synonyms will be reduced to the first in the list.
  • Query time synonyms are commented out.

Upvotes: 1

Related Questions