Reputation: 129

How to resolve this ambiguous grammar?

I have written this grammar:

expr        : multExpr ( ('+' | '-') multExpr )*;
multExpr    : atom ( ('*' | '/') atom )*;
atom    : INT | FLOAT | ID | '(' expr ')';
condition   : cond ('or' cond)*;
cond    : c1 ('and' c1)*;
c1      : ('not')? c2;
c2      : '(' condition ')' | boolean;
boolean : expr (relop expr | ²) | 'true' | 'false';
relop   : '<' | '<=' | '>' | '>=' | '==' | '!=';

I have omitted the lexer rules for INT,FLOAT,ID as it is obvious.

The problem is c2 rule, it is ambiguous because of '(', I could not find the solution, can you offer me a solution?

Upvotes: 3

Answers (4)

Jerry Coffin

Reputation: 490728

You problem stems from the fact that the '(' could be the start of either the first alternative for c2 or the last alternative for atom. Just for example, given input like ((x+y) > (a+b)), the first open paren is the beginning of a c2, but the second is the beginning of an atom. [edit: And the parser has no indication of which way to go until some arbitrary point later -- for example, it can't know that the first open paren is the beginning of a c2 until it encounters the >. For example, if that were a * instead, then both the opening parens would be beginnings of atoms.]

One possible way to handle it would be to unify the rules for arithmetic and Boolean expressions, so you only have one rule with '(' expression '), and the expression might be arithmetic or Boolean. This often, however, has the side-effect of producing rather loose typing, with relatively free conversion between arithmetic and Boolean expressions (at least at the parser level -- you can then enforce the types as rigidly as you like in the semantics).

Edit: In Pascal, for example, the rules run something like this (simplifying a tiny bit):

expression: simple_expression ( rel_op simple_expression )*

simple_expression: ( '+' | '-')? term ( ('+' | '-' | 'or' ) term )*

term: factor ( ( '/' | '*' | 'div' | 'mod' | 'and') factor )*

factor: constant | variable | function_call | '(' expression ')' | 'not' factor

Upvotes: 2

Bart Kiers

Reputation: 170308

Why not simply do:

expr      : orExpr; 
orExpr    : andExpr ('or' andExpr)*;
andExpr   : relExpr ('and' relExpr)*;
relExpr   : addExpr (relop addExpr)?;
relop     : '<' | '<=' | '>' | '>=' | '==' | '!=';
addExpr   : multExpr (('+' | '-') multExpr)*;
multExpr  : unaryExpr (('*' | '/') unaryExpr)*;
unaryExpr : 'not'? atom;
atom      : INT | FLOAT | ID | 'true' | 'false' | '(' expr ')';

The unary not usually has a higher precedence than you're trying to do now.

This will allow for expressions like 42 > true, but checking such semantics can come when you're walking the AST/tree.

EDIT

The input "not(a+b >= 2 * foo/3.14159) == false" would now be parsed like this (ignoring spaces):

enter image description here

And if you set the output to AST and mix in some tree rewrite operators (^ and !):

options {
  output=AST;
}

// ...

expr      : orExpr; 
orExpr    : andExpr ('or'^ andExpr)*;
andExpr   : relExpr ('and'^ relExpr)*;
relExpr   : addExpr (relop^ addExpr)?;
relop     : '<' | '<=' | '>' | '>=' | '==' | '!=';
addExpr   : multExpr (('+' | '-')^ multExpr)*;
multExpr  : unaryExpr (('*' | '/')^ unaryExpr)*;
unaryExpr : 'not'^ atom | atom;
atom      : INT | FLOAT | ID | 'true' | 'false' | '('! expr ')'!;

you'd get:

enter image description here

Upvotes: 5

Bill

Reputation: 25565

One way to approach this problem is to split it into two sets of lexer rules and apply them sequentially to the input (one for the math stuff, the other for the boolean).

Upvotes: 0

Scott Hunter

Reputation: 49920

Couldn't you define c1 as the following?

('not')? (('(' condition ')') | boolean)

Upvotes: 0

How to resolve this ambiguous grammar?

Answers (4)

Related Questions