Reputation: 13
I do not know if this is a bug or feature but Solr NGramFilterFactory does not work on numbers.
Here is my field type:
<fieldType name="phone_test" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.LowerCaseTokenizerFactory"/>
<filter class="solr.NGramFilterFactory" minGramSize="3" maxGramSize="30" side="front" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.LowerCaseTokenizerFactory"/>
</analyzer>
</fieldType>
when I use the analyser in the Solr admin interface and type a word e.g "business" it works fine but when I write numbers e.g 12345678 it does not work.
What I want is to search for part of phone numbers. If I have 123456789 as a phone number and I search for 456 or 6789 I should get a hit.
Any ideas?
Upvotes: 1
Views: 646
Reputation: 22555
The definition for the LowerCaseFilterFactory is as follows.
Creates tokens by lowercasing all letters and dropping non-letters.
It is dropping your numbers because they are non-letters. I would recommend using the KeywordTokenizerFactory or StandardTokenizerFactory. As these should properly handle your numeric input.
Upvotes: 2