zod
zod

Reputation: 2788

Configure SOLR to find documents if the plural is used in the document, and the singular in the search text?

I am using solr, set up at localhost:8983 I am basically using the out of the box example. I have entered one document with a name "Car", and another with a name "Cars".

If I visit either:

http://localhost:8983/solr/select?q=Car

or

http://localhost:8983/solr/select?q=Cars

I would expect to get both documents. At the moment, I don't.

In the fields tag of "schema.xml", the entry for "name" is:

"text_general" has the following "analyzers" (without the stemmers):

<analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
</analyzer>

I tried to add a stemmer to each analyzer. I tried:

<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.KStemFilterFactory"/>
<filter class="solr.EnglishMinimalStemFilterFactory"/>

Doing so makes it such that searching for "Cars" will find "Car", but I can never find "Cars".

Should it be possible to find "Cars"?

Any help would be greatly appreciated. Thank you.

Upvotes: 10

Views: 10821

Answers (2)

Fuxi
Fuxi

Reputation: 5488

It is possible, just add porter filter at the end (after LowerCaseFilterFactory):

<filter class="solr.SnowballPorterFilterFactory" language="English" />

Read more:

  1. Snowball docs with example of use in analyser
  2. Solr LanguageAnalysis
  3. The English (Porter2) stemming algorithm

If there is no special need, I would not divide analyser to index and query time. Your query time analyser looks perfectly good to use it in both cases.

Upvotes: 21

Jules
Jules

Reputation: 11

I found that changing from text_general to text_en in the shema.xml fields took care of this plurality problem

Upvotes: 1

Related Questions