Reputation: 99
I am trying to index documents for case insensitive search using KeywordTokenizer.
I have created a custom Analyzer that is supposed to do keyword tokenisation as well as convert all keywords to lowercase:
public class LowercasingKeywordAnalyzer extends Analyzer {
@Override
protected TokenStreamComponents createComponents(String fieldName) {
KeywordTokenizer keywordTokenizer = new KeywordTokenizer();
return new TokenStreamComponents(keywordTokenizer, new LowerCaseFilter(keywordTokenizer));
}
}
Why does search return no results when I am submitting TermQuery with all criteria terms lowecased?? Here is a unit test reproducing the issue:
@Test
public void experiment() throws IOException, ParseException {
Analyzer analyzer = new LowercasingKeywordAnalyzer();
Directory directory = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(analyzer);
IndexWriter iwriter = new IndexWriter(directory, config);
Document doc = new Document();
String text = "This is the text to be indexed.";
doc.add(new StringField("fieldname", text, Store.NO));
iwriter.addDocument(doc);
iwriter.close();
// Now search the index:
DirectoryReader ireader = DirectoryReader.open(directory);
IndexSearcher isearcher = new IndexSearcher(ireader);
//THE TEST PASSES WITH THE CASE SENSITIVE QUERY TERM, BUT DOES NOT PASS WITH LOWERCASED
//Query query = new TermQuery(new Term("fieldname", "This is the text to be indexed."));
Query query = new TermQuery(new Term("fieldname", "This is the text to be indexed.".toLowerCase()));
ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;
assertEquals(1, hits.length);
ireader.close();
directory.close();
}
Please help me to identify what is wrong here?
NOTE: I am aware of Lucene QueryParsers as well as deprecation of some interfaces, please do not bother commenting on this.
Upvotes: 0
Views: 1100
Reputation: 33351
StringField
is not analyzed. No analyzer you define will affect it. You can use a TextField
instead, or a Field
where you can define your own FieldType
. Or just lowercase it before constructing the field and continue to use StringField
.
Upvotes: 1