Reputation: 8505
I have configured a field in Solr as follows. When I search for the word "Conditioner", I was hoping to find words that contain "Conditioning" also. But based on Solr Analysis, the porterstemfilter is cutting the word "Conditioning" to "Condit" at index time. Hence, at the search time, when I query for "Conditioner", it is stemmed as "Condition" and hence not matching "Conditioning".
How to configure stemming so that both Conditioner and Conditioning should stem to condition?
<fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1"
catenateWords="1" catenateNumbers="1" catenateAll="0"
splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1"
catenateWords="0" catenateNumbers="0" catenateAll="0"
splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
Upvotes: 2
Views: 1813
Reputation: 9500
I would also suggest to try a different Stemmer. There are 4 included in Solr
Each of those produces different results for your problem, see below. Given the results and that you do not need an external resource, I would also opt for KStem. If you do not fear to include a dictionary, I would go for hunspell.
Upvotes: 4