Eugeny89
Eugeny89

Reputation: 3741

how to make lucene be case-insensitive

By default word "Word" and "word" are not the same. How can I make Lucene be case-insensitive?

Upvotes: 13

Views: 32779

Answers (4)

Luchio
Luchio

Reputation: 53

In addition to using the StandardAnalyzer, which includes LowerCaseFilter and filters for common English words (such as "the"), you should also ensure you build your document using TextFields, not StringField which are for exact searches.

Upvotes: 3

Bhawna Singh
Bhawna Singh

Reputation: 9

Add LowerCaseFilterFactory to your fieldType for that field in Schema.xml. Example,

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
            <analyzer type="index">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1" />
                <filter class="solr.LowerCaseFilterFactory"/>
            </analyzer>

            <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
                <filter class="solr.LowerCaseFilterFactory"/>
            </analyzer>
        </fieldType>

Upvotes: 0

WhiteFang34
WhiteFang34

Reputation: 72079

The StandardAnalyzer applies a LowerCaseFilter that would make "Word" and "word" the same. You could simply pass that to your uses of IndexWriter and QueryParser. E.g. a few line snippets:

Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30);
IndexWriter writer = new IndexWriter(dir, analyzer, true, mlf);
QueryParser parser = new QueryParser(Version.LUCENE_30, field, analyzer);

Upvotes: 7

Johan Sj&#246;berg
Johan Sj&#246;berg

Reputation: 49237

The easiest approach is lowercasing all searchable content, as well as the queries. See the LowerCaseFilter documentation. You could also use Wildcard queries for case insensitive search since it bypasses the Analyzer.

You can store content in different fields to capture different case configurations if preferred.

Upvotes: 12

Related Questions