Reputation: 129
I have written this grammar:
expr : multExpr ( ('+' | '-') multExpr )*;
multExpr : atom ( ('*' | '/') atom )*;
atom : INT | FLOAT | ID | '(' expr ')';
condition : cond ('or' cond)*;
cond : c1 ('and' c1)*;
c1 : ('not')? c2;
c2 : '(' condition ')' | boolean;
boolean : expr (relop expr | ²) | 'true' | 'false';
relop : '<' | '<=' | '>' | '>=' | '==' | '!=';
I have omitted the lexer rules for INT,FLOAT,ID as it is obvious.
The problem is c2 rule, it is ambiguous because of '(', I could not find the solution, can you offer me a solution?
Upvotes: 3
Views: 536
Reputation: 490128
You problem stems from the fact that the '(' could be the start of either the first alternative for c2
or the last alternative for atom
. Just for example, given input like ((x+y) > (a+b))
, the first open paren is the beginning of a c2
, but the second is the beginning of an atom
. [edit: And the parser has no indication of which way to go until some arbitrary point later -- for example, it can't know that the first open paren is the beginning of a c2
until it encounters the >
. For example, if that were a *
instead, then both the opening parens would be beginnings of atom
s.]
One possible way to handle it would be to unify the rules for arithmetic and Boolean expressions, so you only have one rule with '(' expression ')
, and the expression
might be arithmetic or Boolean. This often, however, has the side-effect of producing rather loose typing, with relatively free conversion between arithmetic and Boolean expressions (at least at the parser level -- you can then enforce the types as rigidly as you like in the semantics).
Edit: In Pascal, for example, the rules run something like this (simplifying a tiny bit):
expression: simple_expression ( rel_op simple_expression )*
simple_expression: ( '+' | '-')? term ( ('+' | '-' | 'or' ) term )*
term: factor ( ( '/' | '*' | 'div' | 'mod' | 'and') factor )*
factor: constant | variable | function_call | '(' expression ')' | 'not' factor
Upvotes: 2
Reputation: 170158
Why not simply do:
expr : orExpr;
orExpr : andExpr ('or' andExpr)*;
andExpr : relExpr ('and' relExpr)*;
relExpr : addExpr (relop addExpr)?;
relop : '<' | '<=' | '>' | '>=' | '==' | '!=';
addExpr : multExpr (('+' | '-') multExpr)*;
multExpr : unaryExpr (('*' | '/') unaryExpr)*;
unaryExpr : 'not'? atom;
atom : INT | FLOAT | ID | 'true' | 'false' | '(' expr ')';
The unary not
usually has a higher precedence than you're trying to do now.
This will allow for expressions like 42 > true
, but checking such semantics can come when you're walking the AST/tree.
EDIT
The input "not(a+b >= 2 * foo/3.14159) == false"
would now be parsed like this (ignoring spaces):
And if you set the output to AST and mix in some tree rewrite operators (^
and !
):
options {
output=AST;
}
// ...
expr : orExpr;
orExpr : andExpr ('or'^ andExpr)*;
andExpr : relExpr ('and'^ relExpr)*;
relExpr : addExpr (relop^ addExpr)?;
relop : '<' | '<=' | '>' | '>=' | '==' | '!=';
addExpr : multExpr (('+' | '-')^ multExpr)*;
multExpr : unaryExpr (('*' | '/')^ unaryExpr)*;
unaryExpr : 'not'^ atom | atom;
atom : INT | FLOAT | ID | 'true' | 'false' | '('! expr ')'!;
you'd get:
Upvotes: 5
Reputation: 25555
One way to approach this problem is to split it into two sets of lexer rules and apply them sequentially to the input (one for the math stuff, the other for the boolean).
Upvotes: 0
Reputation: 49803
Couldn't you define c1 as the following?
('not')? (('(' condition ')') | boolean)
Upvotes: 0