Reputation: 2834
I'm creating a grammar for a subset of Java, and seem to be running into an issue in which my grammar is considered ambiguous by ANTLR (or well, I think that's the reasoning behind the error at least).
Here is the relevant portion of my grammar:
expr : (op=MINUS | op=NOT) expr exprPrime -> ^(Expr $op expr exprPrime)
| NEW ID OPEN_PAREN CLOSE_PAREN exprPrime -> ^(Expr NEW ID OPEN_PAREN CLOSE_PAREN exprPrime)
| ID exprPrime -> ^(Expr ID exprPrime)
| THIS exprPrime -> ^(Expr THIS exprPrime)
| INTEGER exprPrime -> ^(Expr INTEGER exprPrime)
| NULL exprPrime -> ^(Expr NULL exprPrime)
| TRUE exprPrime -> ^(Expr TRUE exprPrime)
| FALSE exprPrime -> ^(Expr FALSE exprPrime)
| OPEN_PAREN expr CLOSE_PAREN exprPrime -> ^(Expr OPEN_PAREN expr CLOSE_PAREN exprPrime)
;
// remove left recursion via exprPrime
exprPrime : (op=PLUS | op=MINUS | op=MULT | op=DIV | op=LT | op=LEQ | op=GT | op=GEQ | op=EQ | op=NEQ | op=AND | op=OR | op=NOT) expr exprPrime
-> ^(ExprPrime $op expr exprPrime)
| DOT ID OPEN_PAREN (expr (COMMA expr)*)? CLOSE_PAREN exprPrime
-> ^(ExprPrime DOT ID OPEN_PAREN (expr (COMMA expr)*)? CLOSE_PAREN exprPrime)
|
-> ^(Epsilon)
;
/*------------------------------------------------------------------
* LEXER RULES
*------------------------------------------------------------------*/
CLASS : 'class' ;
PUBLIC : 'public' ;
STATIC : 'static' ;
EXTENDS : 'extends' ;
NEW : 'new' ;
THIS : 'this' ;
NULL : 'null' ;
IF : 'if' ;
ELSE : 'else' ;
WHILE : 'while' ;
MAIN : 'main' ;
TRUE : 'true' ;
FALSE : 'false' ;
RETURN : 'return' ;
SYSO : 'System.out.println' ;
OPEN_BRACKET : '{' ;
CLOSE_BRACKET : '}' ;
OPEN_SQUARE : '[' ;
CLOSE_SQUARE : ']' ;
OPEN_PAREN : '(' ;
CLOSE_PAREN : ')' ;
ASSIGN : '=' ;
COMMA : ',' ;
DOT : '.' ;
SEMICOLON : ';' ;
STRING_TYPE : 'String' ;
INTEGER_TYPE : 'int' ;
VOID_TYPE : 'void' ;
BOOLEAN_TYPE : 'boolean' ;
PLUS : '+' ;
MINUS : '-' ;
MULT : '*' ;
DIV : '/' ;
LT : '<' ;
LEQ : '<=' ;
GT : '>' ;
GEQ : '>=' ;
EQ : '==' ;
NEQ : '!=' ;
AND : '&&' ;
OR : '||' ;
NOT : '!' ;
ID : LETTER (LETTER | DIGIT)* ;
INTEGER : (NON_ZERO_DIGIT DIGIT*) | ZERO ;
WHITESPACE : ('\t' | ' ' | '\r' | '\n'| '\u000C')+ { $channel = HIDDEN; } ;
COMMENT : (('/*' .* '*/') | ('//' .* '\n')) { $channel = HIDDEN; } ;
fragment ZERO : '0' ;
fragment DIGIT : '0'..'9' ;
fragment NON_ZERO_DIGIT : '1'..'9' ;
fragment LETTER : 'a'..'z' | 'A'..'Z' ;
And here is the error I get when creating my grammar:
warning(200): MiniJava.g:101:11: Decision can match input such as "NEQ" using multiple alternatives: 1, 3
As a result, alternative(s) 3 were disabled for that input
warning(200): MiniJava.g:101:11: Decision can match input such as "EQ" using multiple alternatives: 1, 3
As a result, alternative(s) 3 were disabled for that input
warning(200): MiniJava.g:101:11: Decision can match input such as "MULT" using multiple alternatives: 1, 3
As a result, alternative(s) 3 were disabled for that input
warning(200): MiniJava.g:101:11: Decision can match input such as "DIV" using multiple alternatives: 1, 3
As a result, alternative(s) 3 were disabled for that input
warning(200): MiniJava.g:101:11: Decision can match input such as "GEQ" using multiple alternatives: 1, 3
As a result, alternative(s) 3 were disabled for that input
warning(200): MiniJava.g:101:11: Decision can match input such as "NOT" using multiple alternatives: 1, 3
As a result, alternative(s) 3 were disabled for that input
warning(200): MiniJava.g:101:11: Decision can match input such as "LT" using multiple alternatives: 1, 3
As a result, alternative(s) 3 were disabled for that input
warning(200): MiniJava.g:101:11: Decision can match input such as "LEQ" using multiple alternatives: 1, 3
As a result, alternative(s) 3 were disabled for that input
warning(200): MiniJava.g:101:11: Decision can match input such as "DOT" using multiple alternatives: 2, 3
As a result, alternative(s) 3 were disabled for that input
warning(200): MiniJava.g:101:11: Decision can match input such as "OR" using multiple alternatives: 1, 3
As a result, alternative(s) 3 were disabled for that input
warning(200): MiniJava.g:101:11: Decision can match input such as "PLUS" using multiple alternatives: 1, 3
As a result, alternative(s) 3 were disabled for that input
warning(200): MiniJava.g:101:11: Decision can match input such as "AND" using multiple alternatives: 1, 3
As a result, alternative(s) 3 were disabled for that input
warning(200): MiniJava.g:101:11: Decision can match input such as "MINUS" using multiple alternatives: 1, 3
As a result, alternative(s) 3 were disabled for that input
warning(200): MiniJava.g:101:11: Decision can match input such as "GT" using multiple alternatives: 1, 3
As a result, alternative(s) 3 were disabled for that input
The line numbers all point to the line with exprPrime's definition (I left out most of the gramar above it for brevity, let me know if I need to post more of it). The exprPrime itself was created to avoid left recursion in 'expr'.
Any idea how I can change my grammar to remove the ambiguities? I'm not sure I even understand what the ambiguities are.
Upvotes: 2
Views: 839
Reputation: 170148
Your grammar can be minimized to show a ambiguity about it:
grammar T; // line 1
// 2
expr // 3
: MINUS expr exprPrime // 4
; // 5
// 6
exprPrime // 7
: MINUS expr exprPrime // 8
| // epsilon // 9
; // 10
// 11
MINUS : '-'; // 12
INTEGER : '0'..'9'+; // 13
Generating a parser from this grammar would result in the following warnings:
[20:38:15] warning(200): T.g:8:2:
Decision can match input such as "MINUS" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
[20:38:15] error(201): T.g:8:2: The following alternatives can never be matched: 2
The first part indicates that on line 8 of the grammar, the generated parser can take both "paths" to match the token MINUS
(both the alternatives on line 8 and 9 match MINUS
!). That's the ambiguity (and in your grammar, there are many, many more). The second part informs you that due to this ambiguity, the generated parser will not be able to take the 2nd alternative (that path is disabled).
One other thing: by matching all the operators in the same alternative, you have no means of making *
have a higher precedence than, say, -
. And all the right-recursive productions make the grammar hard to comprehend. I'd do it something like this instead:
// ...
tokens {
UNARY_MINUS;
PARAMS;
INVOKE;
}
parse
: expr EOF
;
expr
: or_expr
;
or_expr
: and_expr (OR^ and_expr)*
;
and_expr
: rel_expr (AND^ rel_expr)*
;
rel_expr
: eq_expr ((LT | GT | LEQ | GEQ)^ eq_expr)?
;
eq_expr
: add_expr ((EQ | NEQ)^ add_expr)?
;
add_expr
: mult_expr ((PLUS | MINUS)^ mult_expr)*
;
mult_expr
: unary_expr ((MULT | DIV)^ unary_expr)*
;
unary_expr
: MINUS atom -> ^(UNARY_MINUS atom)
| atom
;
atom
: NEW ID OPEN_PAREN params CLOSE_PAREN -> ^(NEW ID params)
| (ID -> ID) (invoke -> ^(ID invoke))?
| THIS
| INTEGER
| NULL
| TRUE
| FALSE
| OPEN_PAREN expr CLOSE_PAREN -> expr
;
invoke
: DOT ID OPEN_PAREN params CLOSE_PAREN invoke? -> ^(INVOKE ID params invoke?)
;
params
: (expr (COMMA expr)*)? -> ^(PARAMS expr*)
;
// ...
No ambiguity, and all operators have the proper precedence (OR
being the lowest, and the unary operators the highest (atoms even higher, of course...)). Parsing input like "x.foo(9,y)&&(1==2||true)+2+3+4==333"
with the expr
rule, would result in the following AST:
Upvotes: 3
Reputation: 13941
The ambiguity lies in the fact that the third alternative of exprPrime
will match anything - the empty rule always succeeds and never advances the input. To eliminate the ambiguity, you could remove that alternative from the definition, and replace all occurrences of exprPrime
with exprPrime?
.
Upvotes: 0