WordDelimiterFilterFactory not including all permutations

Question

I have a Solr index that has to deal with part numbers - which the WordDelimiterFilterFactory seems ideally suited for. An example part number could be "CH2300-100". I'm expecting the following queries to match this field (and they do):

CH
CH2300-100
CH2300100

But the following query doesn't match:

CH2300

Looking at the debugging output - that combination of word parts isn't generated. I expected the catenateWords and/or catenateNumbers attribute to handle this case but it seems not to work. Am I missing something in the configuration that would allow all permutations of the tokenized fragments to be matched?

id

accounted4 · Accepted Answer

I suspect that 'CH2300' is not an indexed token because splitOnNumerics="1". At the split phase, it separates CH and 2300 and then it applies all of the generators to those individually (as well as to the catenated tokens).

One option is to add splitOnNumerics="0" to your filter factory. However, that may keep 'CH' from matching. Another option is to add a filter factory at query time that splits on numerics.

Edit

A third and possibly better option is to use a shingle filter factory and to leave splitOnNumerics="1" so that all of CH, 2300, and CH2300 get indexed. Adding this line after your word delimiter filter factory should solve the problem:

WordDelimiterFilterFactory not including all permutations

Answers (1)

Related Questions