Reputation: 39

Solr: Integrating Partial Match and Exact Match results

Consider a car database containing something like:

Mercedes C class
Mercedes A class
BMW 3 Series
Mazda 3

I have a schema that would return results for partial matches. As you can see I have limited the minimum character to be considered to 2:

<fieldType class="solr.TextField" name="string_contains" positionIncrementGap="100">
   <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" enablePositionIncrements="true" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        <filter class="solr.EdgeNGramFilterFactory" maxGramSize="15" minGramSize="2"/>
        <filter class="solr.ReverseStringFilterFactory"/>
        <filter class="solr.EdgeNGramFilterFactory" maxGramSize="15" minGramSize="2"/>
        <filter class="solr.ReverseStringFilterFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
   </analyzer>
   <analyzer type="query">
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
   </analyzer>
</fieldType>

So if a user searches for 'ercedes' both Mercedes entries would be returned. If a user searches for 'C' or '3', nothing will be returned since the schema sets a minimum of 2 characters.

I also have the following schema, which will return any exact matches:

<fieldType class="solr.TextField" name="textStemmed" omitNorms="true" positionIncrementGap="0">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" enablePositionIncrements="true" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="querystopwords.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        <filter class="solr.PorterStemFilterFactory"/>
    </analyzer>

Using the above, searching 'C' would return 'Mercedes C class' because it is an exact match, but nothing for a partial match.

Is it possible to somehow have a schema which works similarly to the first one, ie it can return partial matches but can also return matches to single character terms when they are an exact match?

thanks Mark

Upvotes: 0

Answers (1)

Persimmonium

Reputation: 15789

you can do this:

declare two (or more) fields 'carpartial' defined as string_contains, 'carexact' as textStemmed.
use copyfield to copy the original field into those additional fields
you use edismax handler to query those two fields, but boosting one more than the other: qf=string_contains^4 textStemmed^6

You might want to tweak your analysis chains, but you see how it works, use different variants of the same fields(you can add more of course), with different boosts.

Upvotes: 1

Solr: Integrating Partial Match and Exact Match results

Answers (1)

Related Questions