Reputation: 141
My current field type in schema is currently defined to do exact match only;
<fieldType name="text_exact" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
</analyzer>
</fieldType>
Now, I want to implement an exact match but special characters are removed during indexing.
I read that using StandardTokenizerFactory would remove the special characters. However, I don't want the side effect of it splitting the phrase on white spaces.
Is it possible to do StandardTokenizerFactory during indexing and then using in query KeywordTokenizerFactory?
Any other ideas?
Upvotes: 1
Views: 3467
Reputation: 9320
You could use CharFilterFactories from Solr, there possible suitable factories for you:
solr.HTMLStripCharFilterFactory: it will remove all html special characters, like <, >, &, etc.
solr.PatternReplaceCharFilterFactory: it will replace all characters, you could use it like regexp:
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="([^a-z])" replacement=""/>
it will remove all non alphabetic chars, similar to this you could remove all your special characters.
For more info - https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#CharFilterFactories
Upvotes: 1