Wasim Karani
Wasim Karani

Reputation: 8886

Original frequency is not matching with suggestion frequency in SOLR

Output for "whs is" - (73) which is a suggestion of "who is" varies than its actual original frequency (94)
For your reference attaching two images of the output


enter image description here


Any way to make both the frequency same

Schema.xml looks likes this

<field name="gram" type="textSpell" indexed="true" stored="true" required="true" multiValued="false"/>
<field name="gram_ci" type="textSpellCi" indexed="true" stored="false" multiValued="false"/>

<copyField source="gram" dest="gram_ci"/>

<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.ShingleFilterFactory" maxShingleSize="5" minShingleSize="2" outputUnigrams="true"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.ShingleFilterFactory" maxShingleSize="5" minShingleSize="2" outputUnigrams="true"/>
    </analyzer>
</fieldType>
<fieldType name="textSpellCi" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.ShingleFilterFactory" maxShingleSize="5" minShingleSize="2" outputUnigrams="true"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.ShingleFilterFactory" maxShingleSize="5" minShingleSize="2" outputUnigrams="true"/>
    </analyzer>
</fieldType>

solrconfig.xml look like this

<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
    <str name="queryAnalyzerFieldType">textSpellCi</str>
    <lst name="spellchecker">
        <str name="name">default</str>
        <str name="field">gram_ci</str>
        <str name="classname">solr.DirectSolrSpellChecker</str>
        <str name="distanceMeasure">internal</str>
        <float name="accuracy">0.5</float>
        <int name="maxEdits">2</int>
        <int name="minPrefix">0</int>
        <int name="maxInspections">5</int>
        <int name="minQueryLength">2</int>
        <float name="maxQueryFrequency">0.99</float>
        <str name="comparatorClass">freq</str>
        <float name="thresholdTokenFrequency">0.0</float>
    </lst>
</searchComponent>
<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
    <lst name="defaults">
        <str name="df">gram_ci</str>
        <str name="spellcheck.dictionary">default</str>
        <str name="spellcheck">on</str>
        <str name="spellcheck.extendedResults">true</str>
        <str name="spellcheck.count">15</str>
        <str name="spellcheck.alternativeTermCount">10</str>
        <str name="spellcheck.onlyMorePopular">false</str>
    </lst>
    <arr name="last-components">
        <str>spellcheck</str>
    </arr>
</requestHandler>

Upvotes: 3

Views: 381

Answers (1)

alexf
alexf

Reputation: 1301

I think these frequencies are not the frequencies of the term in the same index :

  • for /spell?spellcheck.q="who is" --> the frequency of "whs is" is the frequency of this term in the index of the spellchecker.

  • for /spell?spellcheck.q="whs is" --> the frequency of "whs is" is the frequency of this term in the general Lucene index.

To have the same frequency, you have to use solr.DirectSolrSpellChecker instead of solr.IndexBasedSpellChecker (I guess) in your searchComponent :

http://wiki.apache.org/solr/DirectSolrSpellChecker

Edit : Depend of the way that you use to index your data.

Upvotes: 1

Related Questions