Reputation: 7235
I have this stemmed field:
<fieldtype name="textes" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords-es.txt" enablePositionIncrements="true"/>
<filter class="solr.SnowballPorterFilterFactory" language="Spanish" protected="protwords-es.txt"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
<filter class="solr.SnowballPorterFilterFactory" language="Spanish" protected="protwords-es.txt"/>
</analyzer>
</fieldtype>
The expected result of the search query alquileres
(rents) would be a match of alquiler
(rent). But when I go to "Field Analysis" in the Solr Admin site, and check an index value of alquiler
and a query value of alquileres
, the following happens:
alquiler
, it gets stemmed into alquil
.alquileres
, it gets stemmed into alquiler
.So the simple case of searching the plural form of a word (alquileres
) would not match its singular form (alquiler
).
Shouldn't both the index and the query be stemmed into the same stem (either alquiler
or alquil
)? Is this a limitation of the algorithm or a misunderstanding/misconfiguration from my part?
Upvotes: 4
Views: 2890
Reputation: 1
I use hunspell from openoffice and it does an excelent job.
My example:
URL-Elastic/_analyze?analyzer=es_AR&text=alquileres
And return:
{
tokens:
[
{
token: "alquiler",
start_offset: 0,
end_offset: 10,
type: "<ALPHANUM>",
position: 1
}
]
}
Link: https://www.openoffice.org/download/index.html
Upvotes: 0
Reputation: 3044
Snowball stemming is very limited... You'd get better result by using a dictionary (Hunspell stemmer) : http://wiki.apache.org/solr/Hunspell
Upvotes: 1