Ankita
Ankita

Reputation: 1456

Solr Search with wrong spell

I have integrated Solr with My eComemrce web application. I am indexing product title and many other fields of Product to Solr. Now I have indexed BLÅBÆRSOMMEREN into product title/name. I have added EdgeNGram as well for Title field. Because of EdgeNGram if I search any of the token I got the result. And Because of spell check if I Search for wrong spell like: BLÅBÆRISOMMEREN, I got the result. But if I search for BLÅBÆRI, I did not get any result as there is not any token for the same.

I want the products in result which have BLÅBÆR because that token is exist. Same for any other wrong spell search.

How can I achieve this? Any help will be appreciated!

Thanks.

Upvotes: 1

Views: 1860

Answers (2)

Peter Dixon-Moses
Peter Dixon-Moses

Reputation: 3209

For misspelled words you can use a fuzzy query (allowing matches on index terms with an edit distance of ~1 or ~2 from the query term).

Using your example, BLÅBÆRISOMMEREN is edit distance 1 (one character difference) from your indexed term.

Therefore the query q=title:BLÅBÆRISOMMEREN~1 will match your title term but BLÅBÆRI will not (without the ngram approach from the previous answer.).

You can also investigate Solr's Suggester component if you're trying to build auto-suggest, as it also can handle fuzzy suggestions like: (BLÅBÆRI -> BLÅBÆRSOMMEREN) and typically responds faster than a traditional query.

Upvotes: 1

Toby Cole
Toby Cole

Reputation: 66

It sounds like you may have Solr's tokenization configured differently for indexing and querying.

So, in your example the following terms may appear in the index:

  • B
  • BL
  • BLÅ
  • BLÅB
  • BLÅBÆ
  • BLÅBÆR
  • BLÅBÆRS

However as your query terms are not being processed into ngrams, you are only searching for

  • BLÅBÆRI

which does not appear within your indexed terms.

This is a common practice when using ngrams, however it sounds like in your use-case you want to return partial matches within your results.

Check your Solr schema to make sure that you have a matching EdgeNGram filter configured for query-time as you do for index-time, e.g.

<fieldType name="text_general_edge_ngram" class="solr.TextField" positionIncrementGap="100">
   <analyzer type="index">
      <tokenizer class="solr.LowerCaseTokenizerFactory"/>
      <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
   </analyzer>
   <analyzer type="query">
      <tokenizer class="solr.LowerCaseTokenizerFactory"/>
      <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
   </analyzer>
</fieldType>

Make sure you're sorting by score though, as this strategy will most likely give you many false-positives!

Upvotes: 2

Related Questions