Reputation: 11
I'm a Solr begginer thrown in at deep end :) I'm dealing with a custom field type with filters defined as below:
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
after that, there's a Stem Filter defined.
I'd like to apply stemming only if the token is longer than X chars, is this possible in Solr?
I know that there's a <filter class="solr.LengthFilterFactory" min="2" max="7"/>
available, but it will just cut off words not matching it's criteria instead of just letting them bypass the stemming.
Any ideas on how to solve it? Thanks in advance :)
Upvotes: 0
Views: 37
Reputation: 9789
Stemmers usually ignore words marked as keyword.
So, you want to add a KeywordMarkerFilterFactory into your chain before the stemmer.
To mark the words of at least X chars, you can use the parameter pattern, which takes Java regular expression. So, even something as basic as ".{13,}" (match any 13 characters or longer) should work.
Upvotes: 1