user2630771
user2630771

Reputation: 39

Solr: Integrating Partial Match and Exact Match results

Consider a car database containing something like:

  1. Mercedes C class
  2. Mercedes A class
  3. BMW 3 Series
  4. Mazda 3

I have a schema that would return results for partial matches. As you can see I have limited the minimum character to be considered to 2:

<fieldType class="solr.TextField" name="string_contains" positionIncrementGap="100">
   <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" enablePositionIncrements="true" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        <filter class="solr.EdgeNGramFilterFactory" maxGramSize="15" minGramSize="2"/>
        <filter class="solr.ReverseStringFilterFactory"/>
        <filter class="solr.EdgeNGramFilterFactory" maxGramSize="15" minGramSize="2"/>
        <filter class="solr.ReverseStringFilterFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
   </analyzer>
   <analyzer type="query">
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
   </analyzer>
</fieldType>

So if a user searches for 'ercedes' both Mercedes entries would be returned. If a user searches for 'C' or '3', nothing will be returned since the schema sets a minimum of 2 characters.

I also have the following schema, which will return any exact matches:

<fieldType class="solr.TextField" name="textStemmed" omitNorms="true" positionIncrementGap="0">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" enablePositionIncrements="true" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="querystopwords.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
        <filter class="solr.PorterStemFilterFactory"/>
    </analyzer>

Using the above, searching 'C' would return 'Mercedes C class' because it is an exact match, but nothing for a partial match.

Is it possible to somehow have a schema which works similarly to the first one, ie it can return partial matches but can also return matches to single character terms when they are an exact match?

thanks Mark

Upvotes: 0

Views: 472

Answers (1)

Persimmonium
Persimmonium

Reputation: 15771

you can do this:

  1. declare two (or more) fields 'carpartial' defined as string_contains, 'carexact' as textStemmed.
  2. use copyfield to copy the original field into those additional fields
  3. you use edismax handler to query those two fields, but boosting one more than the other: qf=string_contains^4 textStemmed^6

You might want to tweak your analysis chains, but you see how it works, use different variants of the same fields(you can add more of course), with different boosts.

Upvotes: 1

Related Questions