Reputation: 103
I want to find "john doe"
with "hn do"
search. "*hn*"
or "john\ d\*"
works but when query includes whitespace then "*hn\ do*"
does not work. Escaping wildcards not helping either.
My field definition as follows:
<fieldType name="string" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<!--<filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="25" side="back" />-->
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
Upvotes: 3
Views: 325
Reputation: 8678
Try using NGramTokenizerFactory
. It will generates n-gram tokens of sizes in the given range. As below
<analyzer>
<tokenizer class="solr.NGramTokenizerFactory" minGramSize="2" maxGramSize="10"/>
</analyzer>
It will works as :
In: "john doe"
Out: "jo","joh","john", "john ","john d","john do",
"john doe", "oh", "ohn","ohn ", "ohn d"...
And remove the KeywordTokenizerFactory from the fieldType
definition.
You can also think of using solr.EdgeNGramTokenizerFactory
It has another attribute side
.
side
: ("front
" or "back
", default is "front
") Whether to compute the n-grams from the beginning (front
) of the text or from the end (back
)
It will works as :
In: "babaloo"
Out: "oo", "loo", "aloo", "baloo"
KeywordTokenizerFactory
: This tokenizer
treats the entire text field as a single token.
Upvotes: 2