pavan kumar
pavan kumar

Reputation: 401

How to ignore accent search in Solr

I am using solr as a search engine. I have a case where a text field contains accent text like "María". When user search with "María", it is giving resut. But when user search with "Maria" it is not giving any result.

My schema definition looks like below:

<fieldtype name="my_text" class="solr.TextField">
       <analyzer type="Index">
           <tokenizer class="solr.WhitespaceTokenizerFactory"/>
           <filter class="solr.LowerCaseFilterFactory"/>
           <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="32" side="front"/>
       </analyzer>
       <analyzer type="query">
           <tokenizer class="solr.WhitespaceTokenizerFactory"/>
           <filter class="solr.LowerCaseFilterFactory"/>

       </analyzer>
</fieldtype>

Please help to solve this issue.

Upvotes: 6

Views: 7623

Answers (2)

sodimel
sodimel

Reputation: 916

Answering here because it's the first result that pop when searching "ignore accents solr".

In the schema.xml generated by haystack (and using aldryn_search, djangocms & djangocms-blog), the answer provided by @soulcheck works if you add the <filter class="solr.ASCIIFoldingFilterFactory"/> line in the text_en fieldType.

Screenshot 1, screenshot 2.

Upvotes: 0

soulcheck
soulcheck

Reputation: 36767

If you're on solr > 3.x you can try using solr.ASCIIFoldingFilterFactory which will change all the accented characters to their unaccented versions from the basic ascii 127-character set.

Remember to put it after any stemming filter you have configured (you're not using one, so you should be ok).

So your config could look like:

<fieldtype name="my_text" class="solr.TextField">
       <analyzer type="Index">
           <tokenizer class="solr.WhitespaceTokenizerFactory"/>
           <filter class="solr.LowerCaseFilterFactory"/>
           <filter class="solr.ASCIIFoldingFilterFactory"/>
           <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="32" side="front"/>
       </analyzer>
       <analyzer type="query">
           <tokenizer class="solr.WhitespaceTokenizerFactory"/>
           <filter class="solr.LowerCaseFilterFactory"/>
           <filter class="solr.ASCIIFoldingFilterFactory"/>

       </analyzer>
</fieldtype>

Upvotes: 13

Related Questions