Escaping for Lucene is not working

Question

I am crafting a lucene query for a document where the "id" field is equal to "ID:123456:789".

I am passing "ID:123456:789" though QueryParser.escape which I have confirmed adds escapes before each colon in the string. I have tried unsuccessfully, using two differnt ways, to create a query out of the escaped string, yet both ways fail to build a valid query to match documents who's id field is equal to "ID:123456:789".

For both methods I am using the escaped versoin of the string to craft the query:

String escapedSearchTerm= QueryParser.escape("ID:123456:789"); // searchTerm = "ID\:123456\:789"

Method 1: (the second colon in the resulting query becomes a space)

QueryParser parser = new QueryParser("id", new StandardAnalyzer());
Query query = parser.parse(escapedSearchTerm);
System.out.println(query.toString(field)); // ID:123456 789 (second colon becomes a space)

Method 2: (both colons in the resulting query becomes a space)

Query query = (new QueryBuilder(analyzer)).createPhraseQuery("id", escapedSearchTerm);
System.out.println(query.toString(field)); // ID 123456 789 (both colons become a space)

As you can see neither of these methods yield the desired query. How can I build a query to match documents with id field exactly equal to the string "ID:123456:789").

femtoRgon · Accepted Answer

QueryParser.escape is designed to escape query syntax. It is not intended to bypass analysis. In the cases you've shown, you are using StandardAnalyzer. The string "ID:123456:789" will be tokenized by the analyzer into three terms: "id", "123456", "789". If you are not using StandardAnalyzer at index time, you should use the appropriate analyzer when constructing your query.

For example:

QueryParser parser = new QueryParser("text", new StandardAnalyzer());
Query query = parser.parse("default", "myfield:ID:123456:789");

This results in a syntax error, for reasons that are pretty obvious, I think.

QueryParser parser = new QueryParser("text", new StandardAnalyzer());
Query query = parser.parse("default", "myfield:" + QueryParser.escape("ID:123456:789"));

results in "myfield:id myfield:123456 myfield:789". The colons have been escaped correctly, but are then removed by analysis. Note the difference between this and

Query query = parser.parse("default", "myfield:ID 123456 789"));

Which results in "myfield:id default:123456 default:789".

If your field is analyzed:

Then a Phrase query is probably the solution you are looking for:

QueryParser parser = new QueryParser("text", new StandardAnalyzer());
Query query = parser.parse("default", "myfield:\"ID:123456:789\""));

If your field is not analyzed:

You can use a KeywordAnalyzer in your QueryParser:

QueryParser parser = new QueryParser("text", new KeywordAnalyzer());
Query query = parser.parse("default", "myfield:" + QueryParser.Escape("ID:123456:789")));

Or you could construct a TermQuery, instead:

Query query = new TermQuery("myfield", "ID:123456:789");

Escaping for Lucene is not working

Answers (1)

Related Questions