Reputation: 4767
One of the expressions that can be very ambiguous up until almost the very end is that of a tuple vs. a parenthesized expression. A tuple is differentiated between a parenthesized expression by the presence of a comma -- and often a single-member tuple is not allowed, as it would be ambiguous, for example from BigQuery:
Tuple syntax
(expr1, expr2 [, ... ])
The output type is an anonymous STRUCT type with anonymous fields with types matching the types of the input expressions. There must be at least two expressions specified. Otherwise this syntax is indistinguishable from an expression wrapped with parentheses.
I am having trouble figuring out why my grammar is ambiguous here, which allows for both:
grammar DBParser;
options { caseInsensitive=true; }
statement: select EOF;
select:
'SELECT' expr (',' expr)*
('FROM' expr) ?
('WHERE' expr) ?
;
expr
: '(' expr ')' # parenExpression
| '(' expr (',' expr)+ ')' # tupleLiteralExpression
| expr 'IN' expr # inExpression
| select # subSelectExpression
| Atom # constantExpression
;
Atom:
[a-z-]+ | [0-9]+ | '\'' Atom '\''
;
WHITESPACE: [ \t\r\n] -> skip;
And with my input:
SELECT id FROM sales WHERE country IN ((select 1,1,1,1,1,1,1,1,1),1)
I get the following profiling information from Antlr telling me I have ambiguities.
Why is this occurring, and how would I properly resolve this?
Upvotes: 0
Views: 83
Reputation: 4767
The ambiguity arises from the non-parenthesized sub-select expression. For example if we have:
SELECT a FROM b WHERE x IN (select 1,1)
The IN
expression part can be parsed in two different ways:
Atom inExpression(tupleLiteralExpression(subSelectExpression, Atom))
Or as:
Atom inExpression(subSelectExpression)
Since (SELECT 1,1)
could either be seen as a select clause SELECT 1,1
or it can be seen as a tuple containing two elements, SELECT 1
and 1
.
Because of this, we must require parentheses around the sub-select so we know where the select clause starts and ends. Here would be the proper grammar resolving the ambiguities:
grammar DBParser;
options { caseInsensitive=true; }
statement: select EOF;
select:
'SELECT' expr (',' expr)*
('FROM' expr) ?
('WHERE' expr) ?
;
expr
: '(' expr ')' # parenExpression
| '(' expr (',' expr)+ ')' # tupleLiteralExpression
| expr 'IN' expr # inExpression
| '(' select ')' # subSelectExpression
| Atom # constantExpression
;
Atom:
[a-z-]+ | [0-9]+ | '\'' Atom '\''
;
WHITESPACE: [ \t\r\n] -> skip;
Upvotes: 1