R..
R..

Reputation: 123

Spellcheck Solr: solr.DirectSolrSpellChecker config

I am trying to test the spellchecking functionality with Solr 4.7.2 using solr.DirectSolrSpellChecker (where you don't need to build a dedicated index).

I have a field named "title" in my index; I used a copy field definition to create a field named "title_spell" to be queried for the spellcheck (title_spell is correctly filled). However, in the admin solr admin console, I always get empty suggesions.

For example: I have a solr document with the title "A B automobile"; I enter in the admin console (spellcheck crossed and under the input field spellcheck.q) "atuomobile". I expect to get at least something like "A B automobile" or "automobile" but the spellcheck suggestion remains empty...

My configuration:

schema.xml (only relevant part copied):

    <fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
        <analyzer type="index">
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.StandardFilterFactory"/>
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.SynonymFilterFactory" synonyms="de_DE/synonyms.txt" ignoreCase="true"
                    expand="true"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.StandardFilterFactory"/>
        </analyzer>
    </fieldType>
    ...
    <field name="title_spell" type="textSpell" indexed="true" stored="true" multiValued="false"/>

solr.xml (only relevant part copied):

<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
    <str name="queryAnalyzerFieldType">textSpell</str>
    <lst name="spellchecker">
        <str name="name">default</str>
        <str name="field">title_spell</str>
        <str name="classname">solr.DirectSolrSpellChecker</str>
        <str name="distanceMeasure">internal</str>
        <float name="accuracy">0.5</float>
        <int name="maxEdits">2</int>
        <int name="minPrefix">1</int>
        <int name="maxInspections">5</int>
        <int name="minQueryLength">4</int>
        <float name="maxQueryFrequency">0.01</float>
        <float name="thresholdTokenFrequency">.01</float>
    </lst>
</searchComponent>
...
<requestHandler name="standard" class="solr.SearchHandler" default="true">
    <lst name="defaults">
        <str name="defType">edismax</str>
        <str name="echoParams">explicit</str>
    </lst>
    <!--Versuch, das online datum mit in die Gewichtung zu nehmen...-->
    <lst name="appends">
        <str name="bf">recip(ms(NOW/MONTH,sort_date___d_i_s),3.16e-11,50,1)</str>
        <!--<str name="qf">title___td_i_s_gcopy^1e-11</str>-->
        <str name="qf">title___td_i_s_gcopy^21</str>
        <str name="q.op">AND</str>
    </lst>


    <arr name="last-components">
        <str>spellcheck</str>
    </arr>
</requestHandler>

What did I miss? Thanks for your answers!

Upvotes: 0

Views: 1187

Answers (2)

TMBT
TMBT

Reputation: 1183

How large is your index? For a small index (think less than a few million docs), you're going to have to tune accuracy, maxQueryFrequency, and thresholdTokenFrequency. (Actually, it would probably be worth doing this on larger indices as well.)

For example, my 1.5 million doc index uses the following for these settings:

      <float name="maxQueryFrequency">0.01</float>
      <float name="thresholdTokenFrequency">.00001</float>
      <float name="accuracy">0.5</float>

accuracy tells Solr how accurate a result needs to be before it's considered worth returning as a suggestion.

maxQueryFrequency tells Solr how frequently the term needs to occur in the index before it's can be considered worth returning as a suggestion.

thresholdTokenFrequency tells Solr what percentage of documents the term must be included in before it's considered worth returning as a suggestion.

If you plan to use spellchecking on multiple phrases, you may need to add a ShingleFilter to your title_spell field.

Another thing you might try is setting your queryAnalyzerFieldType to title_spell.

Upvotes: 2

YoungHobbit
YoungHobbit

Reputation: 13402

Can you please try editing your requestHandler declaration.

<requestHandler name="/standard" class="solr.SearchHandler" default="true">

and query url as:

http://localhost:8080/solr/service/standard?q=<term>&qf=title_spell

First experiment with small terms and learn how it is behaving. One problem here is it will only return all the terms starting with the same query term. You can use FuzzyLookupFactory which will match and return fuzzy result. For more information check solr suggester wiki.

Upvotes: 0

Related Questions