Elixir Techne
Elixir Techne

Reputation: 1856

Case insensitive seach Solr 5.5

I am trying to create a very simple solr application where i will index title and id. I want to search title without any case and I have used LowerCaseFilterFactory filter but some how it is not working. I want stemming search support too.

Below is my schema file.

<?xml version="1.0" encoding="UTF-8" ?>

<schema name="example" version="1.5">

   <field name="_version_" type="long" indexed="true" stored="true"/>
   <field name="_root_" type="string" indexed="true" stored="false"/>
   <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />

 <uniqueKey>id</uniqueKey>
 <field name="title" type="text" indexed="true" stored="true"/>
 <fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
 <fieldType name="string" class="solr.StrField" sortMissingLast="true" />
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
        <analyzer type="index">
                <tokenizer class="solr.KeywordTokenizerFactory" />
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.PorterStemFilterFactory"/>
        </analyzer>
        <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory" />
                <filter class="solr.LowerCaseFilterFactory" />
        </analyzer>
</fieldType>
</schema>

Any pointer will be highly appreciated.

Thanks in advance.

Upvotes: 0

Views: 473

Answers (1)

Abhijit Bashetti
Abhijit Bashetti

Reputation: 8658

You are using "KeywordTokenizerFactory" for indexing and "WhitespaceTokenizerFactory" for querying.

Here the output for both of it is different.

KeywordTokenizerFactory keeps the word as it is . It wont tokenize the word/text.

But WhitespaceTokenizerFactory is a tokenizer that divides text at whitespaces.

PorterStemFilterFactory : a normalization process that removes common endings from words.

Example: "riding", "rides", "horses" ==> "ride", "ride", "hors".

You can try modifying the field Type.

Now it is left to your requirement that how do you want to implement the search and then decide the to build the custom fieldType or may use the fieldType mentioned in the schema.xml

You can try a field Type like below for your field title

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
        <analyzer type="index">
                <tokenizer class="solr.WhitespaceTokenizerFactory" />
                <filter class="solr.LowerCaseFilterFactory" />
        </analyzer>
        <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory" />
                <filter class="solr.LowerCaseFilterFactory" />
        </analyzer>
</fieldType>

OR

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
       <analyzer type="index">
            <tokenizer class="solr.WhitespaceTokenizerFactory" />
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.PorterStemFilterFactory"/> 
       </analyzer>
       <analyzer type="query">
            <tokenizer class="solr.WhitespaceTokenizerFactory" />
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.PorterStemFilterFactory"/>  
       </analyzer>
</fieldType>

Upvotes: 2

Related Questions