Reputation: 674
I am making search job site using Lucene, and coped with such problem. I need to search C#, .net so i need to use WhiteSpaceAnalyzer, but if i use it search will be case sensetive.
How can i make this case insensative? Now I see only one solution is to make own Analyzer. But i am new in Lucene, can you please help me with some sample of code for this. I made something that i think must work but it is not. Look
public sealed class NewWhitespaceAnalyzer : Analyzer
{
public override TokenStream TokenStream(System.String fieldName, System.IO.TextReader reader)
{
return new LowerCaseFilter(new WhitespaceTokenizer(reader));
}
public override TokenStream ReusableTokenStream(System.String fieldName, System.IO.TextReader reader)
{
Tokenizer tokenizer = (Tokenizer)GetPreviousTokenStream();
if (tokenizer == null)
{
tokenizer = new WhitespaceTokenizer(reader);
SetPreviousTokenStream(tokenizer);
}
else
tokenizer.Reset(reader);
return tokenizer;
}
}
If you would see mistake here please correct me.
If you have any other suggestions, you are wlcome.
Thanks for any help, Dima.
Upvotes: 0
Views: 3346
Reputation: 2113
Try this:
public sealed class NewWhitespaceAnalyzer : Analyzer
{
public override TokenStream TokenStream(System.String fieldName, System.IO.TextReader reader)
{
return new LowerCaseFilter(new WhitespaceTokenizer(reader));
}
public override TokenStream ReusableTokenStream(System.String fieldName, System.IO.TextReader reader)
{
SavedStreams streams = (SavedStreams) GetPreviousTokenStream();
if (streams == null)
{
streams = new SavedStreams();
SetPreviousTokenStream(streams);
streams.tokenStream = new WhiteSpaceTokenizer(reader);
streams.filteredTokenStream = new LowerCaseFilter(streams.tokenStream);
}
else
{
streams.tokenStream.Reset(reader);
}
return streams.filteredTokenStream;
}
}
Upvotes: 3
Reputation: 96
There are 2 points:
use LowerCaseFilter
also in the ReusableTokenStream
method.
don't forget to use this custom Analyzer
in both the query parsing and the document indexing.
enjoy.
Upvotes: 0