Reputation: 55
Solr: 3.5
Hi,
I created a dutch field type according to the following fieldType definition:
<fieldType name="text_nl" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1"
catenateWords="1" catenateNumbers="1" catenateAll="0" preserveOriginal="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StemmerOverrideFilterFactory" words="lang/stemdict_nl.txt" ignoreCase="true"/>
<filter class="solr.SnowballPorterFilterFactory" language="Kp" words="lang/stemdict_nl.txt"/>
</analyzer>
</fieldType>
stemdict_nl.txt is using 45710 word rules according to the http://snowball.tartarus.org/algorithms/kraaij_pohlmann/stemmer.html algorithm.
Most of the search queries seem to be working fine and I am getting mostly correct suggestions.
However there is an issue when I search on 'etiketje'. According to my rules:
etiket etiket
etiketten etiket
etiketteren etiketteer
etikettering etiketteer
etiketje etiket
It should fallback on 'etiket'. Except however it fallsback on 'etik'. When I analyse my field, SOLR returns:
etiketje
etiketje
etiketje
etiketje
etik
I would love for SOLR to analyse 'Etiketje' as:
etiketje
etiket
Hopefully someone here can point me in the right direction.
Upvotes: 1
Views: 924
Reputation: 11023
Try changing your definition to the exact syntax as shown on the wiki i.e. change
<filter class="solr.StemmerOverrideFilterFactory"
words="lang/stemdict_nl.txt" ignoreCase="true"/>
<filter class="solr.SnowballPorterFilterFactory"
language="Kp" words="lang/stemdict_nl.txt"/>
to
<filter class="solr.StemmerOverrideFilterFactory"
dictionary="lang/stemdict_nl.txt"/>
<filter class="solr.SnowballPorterFilterFactory"
language="Kp"/>
You do not need ignoreCase=true
on the StemmerOverrideFilter since you are using LowerCaseFilter before that filter anyway.
Upvotes: 1