Reputation: 1856
I am trying to create a very simple solr application where i will index title and id. I want to search title without any case and I have used LowerCaseFilterFactory filter but some how it is not working. I want stemming search support too.
Below is my schema file.
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="example" version="1.5">
<field name="_version_" type="long" indexed="true" stored="true"/>
<field name="_root_" type="string" indexed="true" stored="false"/>
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<uniqueKey>id</uniqueKey>
<field name="title" type="text" indexed="true" stored="true"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
</schema>
Any pointer will be highly appreciated.
Thanks in advance.
Upvotes: 0
Views: 473
Reputation: 8658
You are using "KeywordTokenizerFactory
" for indexing and "WhitespaceTokenizerFactory
" for querying.
Here the output for both of it is different.
KeywordTokenizerFactory
keeps the word as it is . It wont tokenize the word/text.
But WhitespaceTokenizerFactory
is a tokenizer that divides text at whitespaces.
PorterStemFilterFactory
: a normalization process that removes common endings from words.
Example: "riding", "rides", "horses" ==> "ride", "ride", "hors".
You can try modifying the field Type.
Now it is left to your requirement that how do you want to implement the search and then decide the to build the custom fieldType
or may use the fieldType
mentioned in the schema.xml
You can try a field Type like below for your field title
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
OR
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
Upvotes: 2