Reputation: 2373
I have a field configured like
<fieldType name="gtext" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<!--Needed for efficient trailling wildcard queries-->
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25" side="front"/>
<filter class="solr.ReversedWildcardFilterFactory" withOriginal="true"
maxPosAsterisk="2" maxPosQuestion="1" minTrailing="2" maxFractionAsterisk="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1"
catenateWords="0"
catenateNumbers="1"
stemEnglishPossessive="1"
catenateAll="0"
preserveOriginal="1"
/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1"
catenateWords="0"
catenateNumbers="1"
stemEnglishPossessive="1"
catenateAll="0"
preserveOriginal="1"
/>
</analyzer>
</fieldType>
So when I search for example fun, it will also return funny. How can I avoid this behavior and have only fun matched? Is it because of reverse wildcards?
Upvotes: 0
Views: 1105
Reputation: 52769
This is cause of the EdgeNGramFilterFactory filter
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25" side="front"/>
EdgeNGramFilterFactory generates edge grams for the token e.g.
funny
would generate -> f, fu, fun, funn, funny .....
So when you search for fun
, documents with funny
would match
ReversedWildcardFilterFactory does not cause this issue, it will only enhance the prefix query search.
for e.g. funny
would be stored as ynnuf
And prefix queries *nny
would be converted to ynn*
which is more good for performance.
Upvotes: 2