Yeti
Yeti

Reputation: 11

Apply Solr filter only if token is longer than X chars

I'm a Solr begginer thrown in at deep end :) I'm dealing with a custom field type with filters defined as below:

<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>

after that, there's a Stem Filter defined.

I'd like to apply stemming only if the token is longer than X chars, is this possible in Solr?

I know that there's a <filter class="solr.LengthFilterFactory" min="2" max="7"/> available, but it will just cut off words not matching it's criteria instead of just letting them bypass the stemming.

Any ideas on how to solve it? Thanks in advance :)

Upvotes: 0

Views: 37

Answers (1)

Alexandre Rafalovitch
Alexandre Rafalovitch

Reputation: 9789

Stemmers usually ignore words marked as keyword.

So, you want to add a KeywordMarkerFilterFactory into your chain before the stemmer.

To mark the words of at least X chars, you can use the parameter pattern, which takes Java regular expression. So, even something as basic as ".{13,}" (match any 13 characters or longer) should work.

Upvotes: 1

Related Questions