Samuel-Zacharie Faure
Samuel-Zacharie Faure

Reputation: 1152

Solr : Suggester dictionary build creates huge temporary files

I'm using sunspot-solr 2.3.0 for my rails app. I implemented a suggester (AnalyzingSuggester) on Solr for autocompletion. I have a database of about 11M entries with 5 fields indexed by Solr.

When building the suggestions dictionary, two files are created in my /tmp/ folder:

I searched a lot but can't seem to understand what exactly is happening and if / how I should prevent this weird behavior. Is this a normal part of the dictionary build ? Is this just some logging ?

Upvotes: 1

Views: 1111

Answers (1)

user2626547
user2626547

Reputation: 23

I have the same problem with my solr 6.0.1. The tmp file blows up indefinitely until the hard drive is full.

My index only contains about 2500 documents.

Search component:

<searchComponent class="solr.SuggestComponent" name="autoSuggest">
        <lst name="suggester">
            <str name="name">analyzingSuggester</str>
            <str name="lookupImpl">AnalyzingLookupFactory</str>
            <str name="storeDir">analyzing_suggestions</str>
            <str name="dictionaryImpl">DocumentDictionaryFactory</str>
            <str name="buildOnCommit">false</str>
            <str name="buildOnStartup">false</str>
            <str name="field">text_suggest_auto</str>
            <str name="suggestAnalyzerFieldType">text_suggestion_auto</str>
        </lst>
    </searchComponent>

Request handler:

<requestHandler class="solr.SearchHandler" name="/suggestAuto" startup="lazy" >
        <lst name="defaults">
            <str name="suggest">true</str>
            <str name="suggest.dictionary">analyzingSuggester</str>
            <str name="suggest.onlyMorePopular">true</str>
            <str name="suggest.count">10</str>
            <str name="suggest.collate">true</str>
        </lst>
        <arr name="components">
            <str>autoSuggest</str>
        </arr>
    </requestHandler>

Field:

<fieldType name="text_suggestion_auto" class="solr.TextField" positionIncrementGap="100">
            <analyzer type="index">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.WordDelimiterFilterFactory" 
            generateWordParts="1" 
            generateNumberParts="1" 
            preserveOriginal="1"/>  
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_general.txt" format="snowball" />
            </analyzer>
            <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_general.txt" format="snowball" />
            </analyzer>
    </fieldType>

Upvotes: 1

Related Questions