Reputation: 1311
I use Solr 6.4 with Haystack 2.6.1, pySolr 3.6:
I'm looking for a google like suggestions autocomplete. Actually use EdgeNGram works good but it returns my documents titles only what is not what I want:
example:
typing: 'new y'
return:
New york, fabulous city that never sleep
A trip to new york by night
...
This give the user only the choice to select a document in particular in the suggestion list and the search will return only document with search based on suggested title.
What I want is a suggestion of revelants words like:
typing: 'new y'
return:
new york
new york by night
new york city
trip to new york
There is an article that suggest to use indexed queries by users that return results and then to use these queries as suggestions: https://lucidworks.com/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
This mean parsing solr log or use a Data import (DIH) from a bunch of saved user's queries in DB.
Actually this article is pretty old (2009) and since then Solr have bring to us the Suggester (https://cwiki.apache.org/confluence/display/solr/Suggester)
Anyway I wonder if there is actually a good tutorial on how to use Suggester with revelant queries instead of returning my documents titles without the need to save the user's queries in DB, import them via scheduled process, reindexing, etc.
My search_indexes.py
class ArticleIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
created = indexes.DateTimeField(model_attr='created')
rating = indexes.IntegerField(model_attr='rating')
title = indexes.CharField(model_attr='title', boost=1.125)
term = indexes.EdgeNgramField(model_attr='title')
def get_model(self):
return Article
My article_text.txt
{{ object.title }}
{{ object.created }}
{{ object.rating }}
My schema.xml
<field name="term" type="text_general" indexed="true" stored="true" />
<field name="weight" type="float" indexed="true" stored="true" />
<fieldType name="edge_ngram" class="solr.TextField" positionIncrementGap="1">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front" />
</analyzer>
</fieldType>
<fieldType name="suggestType" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[^a-zA-Z0-9]" replacement=" " />
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
My solrconfig.xml
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy" >
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.dictionary">infixSuggester</str>
<str name="suggest.onlyMorePopular">true</str>
<str name="suggest.count">10</str>
<str name="suggest.collate">true</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">infixSuggester</str>
<str name="lookupImpl">AnalyzingInfixLookupFactory</str>
<str name="indexPath">infix_suggestions</str>
<str name="highlight">false</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">term</str>
<str name="weightField">weight</str>
<str name="suggestAnalyzerFieldType">suggestType</str>
<str name="buildOnStartup">false</str>
<str name="buildOnCommit">false</str>
</lst>
</searchComponent>
I use pysolr to query Solr as Haystack doesn't have the suggest method implemented yet:
from pysolr import Solr
solr = Solr(settings.HAYSTACK_CONNECTIONS['default']['URL'], search_handler='/suggest', use_qt_param=False)
raw_results = solr.search('', **{'suggest.q': query_string})
Upvotes: 1
Views: 2512
Reputation: 1308
For what you need, I suggest using the BlendedInfixLookupFactory set up as follows:
In schema.xml, create a field that you will use for the suggester, then copy into that field:
<field name="title" type="text_general" indexed="true" stored="true" />
<field name="term_suggest" type="phrase_suggest" indexed="true" stored="true" multiValued="true"/>
<copyField source="title" dest="term_suggest"/>
<fieldType name="phrase_suggest" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<fieldType name="text_suggest" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
Then in the solrconfig.xml file:
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">suggest</str>
<str name="lookupImpl">BlendedInfixLookupFactory</str>
<str name="blenderType">linear</str>
<str name="dictionaryimpl">DocumentDictionaryFactory</str>
<str name="field">term_suggest</str>
<str name="weightField">weight</str>
<str name="suggestAnalyzerFieldType">text_suggest</str>
<str name="queryAnalyzerFieldType">phrase_suggest</str>
<str name="indexPath">suggest</str>
<str name="buildOnStartup">false</str>
<str name="buildOnCommit">false</str>
<bool name="exactMatchFirst">true</bool>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="wt">json</str>
<str name="indent">false</str>
<str name="suggest">true</str>
<str name="suggest.count">10</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
With the BlendedInfixLookupFactory you can find "new y" wherever it occurs in the field, giving greater weight to those occurring at the beginning. The combination of using the standard tokenizer for the suggestAnalyzerFieldType and keyword tokenizer for the queryAnalyzerFieldType will make it so you can search using spaces (the query "new y" will be read as a string or keyword).
The confluence wiki link that you posted is good, it was last modified in September 2016.
EDIT: I didn't realize you didn't want the whole titles. You can try using shingles for this, by changing the phrase_suggest fieldType in the above schema to this:
<fieldType name="phrase_suggest" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.ShingleFilterFactory"
minShingleSize="2"
maxShingleSize="4"
outputUnigrams="true"
outputUnigramsIfNoShingles="true"/>
</analyzer>
</fieldType>
EDIT2: Alternatively, you could use the phrase_suggest with a standard tokenizer with a shingle filter for the index analyzer and keyword tokenizer for the query analyzer:
<fieldType name="phrase_suggest" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.ShingleFilterFactory"
minShingleSize="2"
maxShingleSize="4"
outputUnigrams="true"
outputUnigramsIfNoShingles="true"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
Then for the suggest searchComponent, you just need:
<str name="suggestAnalyzerFieldType">phrase_suggest</str>
(and no queryAnalyzerFieldType). Of course, you'll need to change the ShingleFilterFactory settings to fit your needs.
Upvotes: 0
Reputation: 1311
After struggling hours I finally get something. Not perfect but good enough.
According to this article : http://alexbenedetti.blogspot.fr/2015/07/solr-you-complete-me.html
I used the FreeTextLookupFactory
My search_indexes.py
class ArticleIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
created = indexes.DateTimeField(model_attr='created')
rating = indexes.IntegerField(model_attr='rating')
title = indexes.CharField(model_attr='title', boost=1.125)
def get_model(self):
return Article
My schema.xml
<field name="django_ct" type="string" indexed="true" stored="true" multiValued="false"/>
<field name="django_id" type="string" indexed="true" stored="true" multiValued="false"/>
<field name="text" type="text_en" indexed="true" stored="true" multiValued="false" termVectors="true" />
<field name="rating" type="long" indexed="true" stored="true" multiValued="false"/>
<field name="title" type="text_en" indexed="true" stored="true" multiValued="false"/>
<field name="created" type="date" indexed="true" stored="true" multiValued="false"/>
My Solrconfig.xml
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">suggest</str>
<str name="lookupImpl">FreeTextLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">title</str>
<str name="ngrams">3</str>
<float name="threshold">0.004</float>
<str name="highlight">false</str>
<str name="buildOnCommit">false</str>
<str name="separator"> </str>
<str name="suggestFreeTextAnalyzerFieldType">text_general</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy" >
<lst name="defaults">
<str name="suggest.dictionary">suggest</str>
<str name="suggest">true</str>
<str name="suggest.count">10</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
As I use Solr 6.4, it is by default on managed schema mode (which did not take my changes in schema.xml in consideration), I had to switch to manual edit mode by adding in solrconfig.xml :
<schemaFactory class="ClassicIndexSchemaFactory"/>
Then restart Solr, Rebuild index using Haystack with rebuild_index
And of course build the suggester with curl: curl http://127.0.0.1:8983/solr/collection1/suggest?suggest.build=true
And finally the results:
curl http://127.0.0.1:8983/solr/collection1/suggest?suggest.q=new%20y
I will try to digg more into the FreeTextLookupFactory to see if I can make it more accurate but it is already satisfying. Hope this help.
PS: always keep an eye on the logs at: http://127.0.0.1:8983/solr/#/~logging I would strongly suggest to have it always open on a tab. It saved my hours of pain...
Upvotes: 2