Reputation: 2481
My client has several keywords that are commonly searched for that contain a letter and numbers:
M4
M12
M18
M28
When these get searched in Solr right now, they are getting tokenized as both the full string and as the letter M
along with the number, so if someone searches for M12
, the search is performed on M
, 12
, and M12
.
What is the best way to prevent this so it only searches for M12
?
EDIT: Figured I should include the tokenizer/filter configuration for the field's type, so here it is:
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" />
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.SnowballPorterFilterFactory" language="English" />
Upvotes: 0
Views: 1065
Reputation: 2481
Turns out the solution was pretty easy. Alex's comment helped me get there, but I ended up just modifying the word delimiter filter, settings splitOnNumerics=0
:
<filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" splitOnNumerics="0" />
Upvotes: 1