Reputation: 8886

Original frequency is not matching with suggestion frequency in SOLR

Output for "whs is" - (73) which is a suggestion of "who is" varies than its actual original frequency (94)
For your reference attaching two images of the output

Output for /spell?spellcheck.q="who is"

enter image description here

Output for /spell?spellcheck.q="whs is"

Any way to make both the frequency same

Schema.xml looks likes this

<field name="gram" type="textSpell" indexed="true" stored="true" required="true" multiValued="false"/>
<field name="gram_ci" type="textSpellCi" indexed="true" stored="false" multiValued="false"/>

<copyField source="gram" dest="gram_ci"/>

<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.ShingleFilterFactory" maxShingleSize="5" minShingleSize="2" outputUnigrams="true"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.ShingleFilterFactory" maxShingleSize="5" minShingleSize="2" outputUnigrams="true"/>
    </analyzer>
</fieldType>
<fieldType name="textSpellCi" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.ShingleFilterFactory" maxShingleSize="5" minShingleSize="2" outputUnigrams="true"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.ShingleFilterFactory" maxShingleSize="5" minShingleSize="2" outputUnigrams="true"/>
    </analyzer>
</fieldType>

solrconfig.xml look like this

<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
    <str name="queryAnalyzerFieldType">textSpellCi</str>
    <lst name="spellchecker">
        <str name="name">default</str>
        <str name="field">gram_ci</str>
        <str name="classname">solr.DirectSolrSpellChecker</str>
        <str name="distanceMeasure">internal</str>
        <float name="accuracy">0.5</float>
        <int name="maxEdits">2</int>
        <int name="minPrefix">0</int>
        <int name="maxInspections">5</int>
        <int name="minQueryLength">2</int>
        <float name="maxQueryFrequency">0.99</float>
        <str name="comparatorClass">freq</str>
        <float name="thresholdTokenFrequency">0.0</float>
    </lst>
</searchComponent>
<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
    <lst name="defaults">
        <str name="df">gram_ci</str>
        <str name="spellcheck.dictionary">default</str>
        <str name="spellcheck">on</str>
        <str name="spellcheck.extendedResults">true</str>
        <str name="spellcheck.count">15</str>
        <str name="spellcheck.alternativeTermCount">10</str>
        <str name="spellcheck.onlyMorePopular">false</str>
    </lst>
    <arr name="last-components">
        <str>spellcheck</str>
    </arr>
</requestHandler>

Upvotes: 3

Answers (1)

alexf

Reputation: 1301

I think these frequencies are not the frequencies of the term in the same index :

for /spell?spellcheck.q="who is" --> the frequency of "whs is" is the frequency of this term in the index of the spellchecker.
for /spell?spellcheck.q="whs is" --> the frequency of "whs is" is the frequency of this term in the general Lucene index.

To have the same frequency, you have to use solr.DirectSolrSpellChecker instead of solr.IndexBasedSpellChecker (I guess) in your searchComponent :

http://wiki.apache.org/solr/DirectSolrSpellChecker

Edit : Depend of the way that you use to index your data.

Upvotes: 1

Original frequency is not matching with suggestion frequency in SOLR

Answers (1)

Related Questions