Travis Parks
Travis Parks

Reputation: 8695

ANTLR4 Ambiguity parsing SQL BETWEEN followed by an AND

I am trying to parse this expression (based on SQLite): 0 BETWEEN 0 AND 2 AND 1 < 9.

Here is the relevant portion:

expression
    : term #termExpression
    | left=expression '<' right=expression #lessThanExpression
    | value=expression NOT? BETWEEN lower=expression AND upper=expression #betweenExpression
    | left=expression AND right=expression #andExpression
    | left=expression OR right=expression #orExpression
    ;

Assume a term is a number (e.g., 0, 1, 2, and 9).

The actual SQLite parser seems to correctly parse it as so: (0 BETWEEN 1 AND 2) AND 1 < 9. However, my parser is treating it like 0 BETWEEN (1 AND 2) AND 1 < 9.

enter image description here

What's the best way to break this ambiguity?

In SQLite, a lot is considered an "expression", including comparisons, arithmetic expressions, AND/OR/NOT, BETWEEN, etc. A boolean can validly appear in the BETWEEN range because it gets evaluated as a number (0 or 1).

There's actually many other alternatives (49 total) mapping to expression in the real grammar.

Upvotes: 0

Views: 80

Answers (1)

Joel Coehoorn
Joel Coehoorn

Reputation: 415725

There's no ambiguity. This is covered in section 2 of the SQLite language reference.

Here, we see BETWEEN has higher precedence than AND, and the full operator expression is described like this:

[expr] BETWEEN [expr] AND [expr] 

Where [expr] is further qualified like so:

"[expr]" denotes operand locations for non-binary operators

(emphasis mine)

Because AND is a binary operator, it cannot itself be part of the BETWEEN expression.

Therefore, the only correct interpretation is the SQLite parser, and your own parser's interpretation does not conform to the spec.

Looking at the image in the post, the flaw is expressionandExpression and expressionlessThanExpression are not valid in those positions, because they represent binary operators.


Note that other binary operators with higher precedence are allowed, because the parser will first reduce them. For example, this is okay:

1 BETWEEN 0 + 1 AND 2 

...because + has higher precedence than BETWEEN, so when the BETWEEN expression evaluates it is already reduced to this:

1 BETWEEN 1 AND 2

Upvotes: 1

Related Questions