Charles Boyung
Charles Boyung

Reputation: 2481

Solr query with words containing letters and numbers

My client has several keywords that are commonly searched for that contain a letter and numbers:

M4
M12
M18
M28

When these get searched in Solr right now, they are getting tokenized as both the full string and as the letter M along with the number, so if someone searches for M12, the search is performed on M, 12, and M12.

What is the best way to prevent this so it only searches for M12?

EDIT: Figured I should include the tokenizer/filter configuration for the field's type, so here it is:

<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" />
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.SnowballPorterFilterFactory" language="English" />

Upvotes: 0

Views: 1065

Answers (1)

Charles Boyung
Charles Boyung

Reputation: 2481

Turns out the solution was pretty easy. Alex's comment helped me get there, but I ended up just modifying the word delimiter filter, settings splitOnNumerics=0:

<filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" splitOnNumerics="0" />

Upvotes: 1

Related Questions