Reputation: 39
Consider a car database containing something like:
I have a schema that would return results for partial matches. As you can see I have limited the minimum character to be considered to 2:
<fieldType class="solr.TextField" name="string_contains" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" enablePositionIncrements="true" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" maxGramSize="15" minGramSize="2"/>
<filter class="solr.ReverseStringFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" maxGramSize="15" minGramSize="2"/>
<filter class="solr.ReverseStringFilterFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
</analyzer>
<analyzer type="query">
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>
So if a user searches for 'ercedes' both Mercedes entries would be returned. If a user searches for 'C' or '3', nothing will be returned since the schema sets a minimum of 2 characters.
I also have the following schema, which will return any exact matches:
<fieldType class="solr.TextField" name="textStemmed" omitNorms="true" positionIncrementGap="0">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" enablePositionIncrements="true" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="querystopwords.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
Using the above, searching 'C' would return 'Mercedes C class' because it is an exact match, but nothing for a partial match.
Is it possible to somehow have a schema which works similarly to the first one, ie it can return partial matches but can also return matches to single character terms when they are an exact match?
thanks Mark
Upvotes: 0
Views: 472
Reputation: 15771
you can do this:
You might want to tweak your analysis chains, but you see how it works, use different variants of the same fields(you can add more of course), with different boosts.
Upvotes: 1