Reputation: 1174
I have list of words for example:
'today
today
t-oday
t oday
t/oda y
How can I retrieve all these words from a Lucene index if I search on the words today
or t/oday
or 'today
.
I actually want the search to be insensitive to ampersand, dash, space and some other characters.
What's is the best way to deal with this situation? Should I write my own analyzer/tokenizer or is there something I can use to perform this search?
I'm using Hibernate Search.
Upvotes: 0
Views: 288
Reputation: 33351
Adding a CharFilter
to your analyzer would probably be the best solution. This allows you to preprocess the input, before even the tokenizer is applied. There are some TokenFilter examples in the Hibernate documentation (see example 4.13).
I'd recommend using a MappingCharFilterFactory
, and define mapping to strip the characters you aren't interested in.
Stripping all the spaces from the input seems a rather unusual case to me, since that will likely prevent useful tokenization, but I suppose I'll assume you have taken that into consideration.
Upvotes: 0