32kda
32kda

Reputation: 63

How to avoid Lucene from tokenizing query string containg '/' or '-'?

Good day!

In my document, I have a date field, which contains ISO-8601 date, which can also be a period like "25-08-2016/P1D"

I want to search a document having exactly this date or period - so ,just having a same value of "date" field. Unfortunately, I was unable to do this. Tried different query strings, with escaping or without, like

What I'm doing wrong? How can I tell Lucene to search this field using simple string match, without any tokenization?

Upvotes: 0

Views: 960

Answers (1)

32kda
32kda

Reputation: 63

After some research I've got, that escaping query string is a wrong way - correct way to achieve this is customizing Query analyzer for necessary field ("date" in my case).

Map<String, Analyzer> analyzerPerField = new HashMap<String, Analyzer>();
        analyzerPerField.put("date", new WhitespaceAnalyzer());
        analyzer = new PerFieldAnalyzerWrapper(
                    new StandardAnalyzer(), analyzerPerField);
        parser = new QueryParser("title", analyzer);

In given code, we just use WhitespaceAnalyzer (which is dividing query by whitespaces) instead of SimpleAnalyzer used by default, which is dividing text at non-letters. WhitespaceAnalyzer does not break ISO-8601 date.

For additional details about custom analyzing/tokenizing in lucene, please refer e.g. http://www.hascode.com/2014/07/lucene-by-example-specifying-analyzers-on-a-per-field-basis-and-writing-a-custom-analyzertokenizer/

Upvotes: 1

Related Questions