Reputation: 1456
I have integrated Solr with My eComemrce web application. I am indexing product title and many other fields of Product to Solr. Now I have indexed BLÅBÆRSOMMEREN into product title/name. I have added EdgeNGram as well for Title field. Because of EdgeNGram if I search any of the token I got the result. And Because of spell check if I Search for wrong spell like: BLÅBÆRISOMMEREN, I got the result. But if I search for BLÅBÆRI, I did not get any result as there is not any token for the same.
I want the products in result which have BLÅBÆR because that token is exist. Same for any other wrong spell search.
How can I achieve this? Any help will be appreciated!
Thanks.
Upvotes: 1
Views: 1860
Reputation: 3209
For misspelled words you can use a fuzzy query (allowing matches on index terms with an edit distance of ~1 or ~2 from the query term).
Using your example, BLÅBÆRISOMMEREN is edit distance 1 (one character difference) from your indexed term.
Therefore the query q=title:BLÅBÆRISOMMEREN~1
will match your title term but BLÅBÆRI will not (without the ngram approach from the previous answer.).
You can also investigate Solr's Suggester component if you're trying to build auto-suggest, as it also can handle fuzzy suggestions like: (BLÅBÆRI -> BLÅBÆRSOMMEREN) and typically responds faster than a traditional query.
Upvotes: 1
Reputation: 66
It sounds like you may have Solr's tokenization configured differently for indexing and querying.
So, in your example the following terms may appear in the index:
However as your query terms are not being processed into ngrams, you are only searching for
which does not appear within your indexed terms.
This is a common practice when using ngrams, however it sounds like in your use-case you want to return partial matches within your results.
Check your Solr schema to make sure that you have a matching EdgeNGram filter configured for query-time as you do for index-time, e.g.
<fieldType name="text_general_edge_ngram" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.LowerCaseTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.LowerCaseTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
</analyzer>
</fieldType>
Make sure you're sorting by score
though, as this strategy will most likely give you many false-positives!
Upvotes: 2