Antoine Dahan
Antoine Dahan

Reputation: 713

Escaping for Lucene is not working

I am crafting a lucene query for a document where the "id" field is equal to "ID:123456:789".

I am passing "ID:123456:789" though QueryParser.escape which I have confirmed adds escapes before each colon in the string. I have tried unsuccessfully, using two differnt ways, to create a query out of the escaped string, yet both ways fail to build a valid query to match documents who's id field is equal to "ID:123456:789".


For both methods I am using the escaped versoin of the string to craft the query:

String escapedSearchTerm= QueryParser.escape("ID:123456:789"); // searchTerm = "ID\:123456\:789"


Method 1: (the second colon in the resulting query becomes a space)

QueryParser parser = new QueryParser("id", new StandardAnalyzer());
Query query = parser.parse(escapedSearchTerm);
System.out.println(query.toString(field)); // ID:123456 789 (second colon becomes a space)


Method 2: (both colons in the resulting query becomes a space)

Query query = (new QueryBuilder(analyzer)).createPhraseQuery("id", escapedSearchTerm);
System.out.println(query.toString(field)); // ID 123456 789 (both colons become a space)


As you can see neither of these methods yield the desired query. How can I build a query to match documents with id field exactly equal to the string "ID:123456:789").

Upvotes: 1

Views: 1104

Answers (1)

femtoRgon
femtoRgon

Reputation: 33351

QueryParser.escape is designed to escape query syntax. It is not intended to bypass analysis. In the cases you've shown, you are using StandardAnalyzer. The string "ID:123456:789" will be tokenized by the analyzer into three terms: "id", "123456", "789". If you are not using StandardAnalyzer at index time, you should use the appropriate analyzer when constructing your query.

For example:

QueryParser parser = new QueryParser("text", new StandardAnalyzer());
Query query = parser.parse("default", "myfield:ID:123456:789");

This results in a syntax error, for reasons that are pretty obvious, I think.

QueryParser parser = new QueryParser("text", new StandardAnalyzer());
Query query = parser.parse("default", "myfield:" + QueryParser.escape("ID:123456:789"));

results in "myfield:id myfield:123456 myfield:789". The colons have been escaped correctly, but are then removed by analysis. Note the difference between this and

Query query = parser.parse("default", "myfield:ID 123456 789"));

Which results in "myfield:id default:123456 default:789".

If your field is analyzed:

Then a Phrase query is probably the solution you are looking for:

QueryParser parser = new QueryParser("text", new StandardAnalyzer());
Query query = parser.parse("default", "myfield:\"ID:123456:789\""));

If your field is not analyzed:

You can use a KeywordAnalyzer in your QueryParser:

QueryParser parser = new QueryParser("text", new KeywordAnalyzer());
Query query = parser.parse("default", "myfield:" + QueryParser.Escape("ID:123456:789")));

Or you could construct a TermQuery, instead:

Query query = new TermQuery("myfield", "ID:123456:789");

Upvotes: 3

Related Questions