Reputation: 8886
Output for "whs is" - (73) which is a suggestion of "who is" varies than its actual original frequency (94)
For your reference attaching two images of the output
/spell?spellcheck.q="who is"
/spell?spellcheck.q="whs is"
Any way to make both the frequency same
Schema.xml looks likes this
<field name="gram" type="textSpell" indexed="true" stored="true" required="true" multiValued="false"/>
<field name="gram_ci" type="textSpellCi" indexed="true" stored="false" multiValued="false"/>
<copyField source="gram" dest="gram_ci"/>
<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.ShingleFilterFactory" maxShingleSize="5" minShingleSize="2" outputUnigrams="true"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.ShingleFilterFactory" maxShingleSize="5" minShingleSize="2" outputUnigrams="true"/>
</analyzer>
</fieldType>
<fieldType name="textSpellCi" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ShingleFilterFactory" maxShingleSize="5" minShingleSize="2" outputUnigrams="true"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ShingleFilterFactory" maxShingleSize="5" minShingleSize="2" outputUnigrams="true"/>
</analyzer>
</fieldType>
solrconfig.xml look like this
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">textSpellCi</str>
<lst name="spellchecker">
<str name="name">default</str>
<str name="field">gram_ci</str>
<str name="classname">solr.DirectSolrSpellChecker</str>
<str name="distanceMeasure">internal</str>
<float name="accuracy">0.5</float>
<int name="maxEdits">2</int>
<int name="minPrefix">0</int>
<int name="maxInspections">5</int>
<int name="minQueryLength">2</int>
<float name="maxQueryFrequency">0.99</float>
<str name="comparatorClass">freq</str>
<float name="thresholdTokenFrequency">0.0</float>
</lst>
</searchComponent>
<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="df">gram_ci</str>
<str name="spellcheck.dictionary">default</str>
<str name="spellcheck">on</str>
<str name="spellcheck.extendedResults">true</str>
<str name="spellcheck.count">15</str>
<str name="spellcheck.alternativeTermCount">10</str>
<str name="spellcheck.onlyMorePopular">false</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
Upvotes: 3
Views: 381
Reputation: 1301
I think these frequencies are not the frequencies of the term in the same index :
for /spell?spellcheck.q="who is" --> the frequency of "whs is" is the frequency of this term in the index of the spellchecker.
for /spell?spellcheck.q="whs is" --> the frequency of "whs is" is the frequency of this term in the general Lucene index.
To have the same frequency, you have to use solr.DirectSolrSpellChecker instead of solr.IndexBasedSpellChecker (I guess) in your searchComponent :
http://wiki.apache.org/solr/DirectSolrSpellChecker
Edit : Depend of the way that you use to index your data.
Upvotes: 1