Reputation: 2365
I created an ANTLR grammar for parsing mathematical expressions and a second one for evaluating them. As I thought building an AST and then re-parsing in order to actually evaluate it is sort of one operation too much, I wanted to refactor my grammar to produce a hierarchy of "Term" objects representing the expression including the logic to perform that particular operation. The root Term object can then be simply evaluated to a concrete result.
I had to rewrite quite a lot of my grammar and finally got rid of the last error message. Unfortunately now ANTLR seems to sort of go into an infinite loop.
Could someone here please help me sort out the problem? I think the grammar should be pretty interesting for some, therefore I am posting it. (It is based upon a garmmar I found with google, I should admit, but I have altered it quite a lot to suite my needs).
grammar SecurityRulesNew;
options {
language = Java;
output=AST;
backtrack = true;
ASTLabelType=CommonTree;
k=2;
}
tokens {
POS;
NEG;
CALL;
}
@header {package de.cware.cweb.services.evaluator.parser;}
@lexer::header{package de.cware.cweb.services.evaluator.parser;}
formula returns [Term term]
: a=expression EOF { $term = a; }
;
expression returns [Term term]
: a=boolExpr { $term = a; }
;
boolExpr returns [Term term]
: a=sumExpr { $term = a; }
| a=sumExpr AND b=boolExpr { $term = new AndTerm(a, b); }
| a=sumExpr OR b=boolExpr { $term = new OrTerm(a, b); }
| a=sumExpr LT b=boolExpr { $term = new LessThanTerm(a, b); }
| a=sumExpr LTEQ b=boolExpr { $term = new LessThanOrEqualTerm(a, b); }
| a=sumExpr GT b=boolExpr { $term = new GreaterThanTerm(a, b); }
| a=sumExpr GTEQ b=boolExpr { $term = new GreaterThanTermOrEqual(a, b); }
| a=sumExpr EQ b=boolExpr { $term = new EqualsTerm(a, b); }
| a=sumExpr NOTEQ b=boolExpr { $term = new NotEqualsTerm(a, b); }
;
sumExpr returns [Term term]
: a=productExpr { $term = a; }
| a=productExpr SUB b=sumExpr { $term = new SubTerm(a, b); }
| a=productExpr ADD b=sumExpr { $term = new AddTerm(a, b); }
;
productExpr returns [Term term]
: a=expExpr { $term = a; }
| a=expExpr DIV productExpr { $term = new DivTerm(a, b); }
| a=expExpr MULT productExpr { $term = new MultTerm(a, b); }
;
expExpr returns [Term term]
: a=unaryOperation { $term = a; }
| a=unaryOperation EXP expExpr { $term = new ExpTerm(a, b); }
;
unaryOperation returns [Term term]
: a=operand { $term = a; }
| NOT a=operand { $term = new NotTerm(a); }
| SUB a=operand { $term = new NegateTerm(a); }
;
operand returns [Term term]
: l=literal { $term = l; }
| f=functionExpr { $term = f; }
| v=VARIABLE { $term = new VariableTerm(v); }
| LPAREN e=expression RPAREN { $term = e; }
;
functionExpr returns [Term term]
: f=FUNCNAME LPAREN! RPAREN! { $term = new CallFunctionTerm(f, null); }
| f=FUNCNAME LPAREN! a=arguments RPAREN! { $term = new CallFunctionTerm(f, a); }
;
arguments returns [List<Term> terms]
: a=expression
{
$terms = new ArrayList<Term>();
$terms.add(a);
}
| a=expression COMMA b=arguments
{
$terms = new ArrayList<Term>();
$terms.add(a);
$terms.addAll(b);
}
;
literal returns [Term term]
: n=NUMBER { $term = new NumberLiteral(n); }
| s=STRING { $term = new StringLiteral(s); }
| t=TRUE { $term = new TrueLiteral(t); }
| f=FALSE { $term = new FalseLiteral(f); }
;
STRING
:
'\"'
( options {greedy=false;}
: ESCAPE_SEQUENCE
| ~'\\'
)*
'\"'
|
'\''
( options {greedy=false;}
: ESCAPE_SEQUENCE
| ~'\\'
)*
'\''
;
WHITESPACE
: (' ' | '\n' | '\t' | '\r')+ {skip();};
TRUE
: ('t'|'T')('r'|'R')('u'|'U')('e'|'E')
;
FALSE
: ('f'|'F')('a'|'A')('l'|'L')('s'|'S')('e'|'E')
;
NOTEQ : '!=';
LTEQ : '<=';
GTEQ : '>=';
AND : '&&';
OR : '||';
NOT : '!';
EQ : '=';
LT : '<';
GT : '>';
EXP : '^';
MULT : '*';
DIV : '/';
ADD : '+';
SUB : '-';
LPAREN : '(';
RPAREN : ')';
COMMA : ',';
PERCENT : '%';
VARIABLE
: '[' ~('[' | ']')+ ']'
;
FUNCNAME
: (LETTER)+
;
NUMBER
: (DIGIT)+ ('.' (DIGIT)+)?
;
fragment
LETTER
: ('a'..'z') | ('A'..'Z')
;
fragment
DIGIT
: ('0'..'9')
;
fragment
ESCAPE_SEQUENCE
: '\\' 't'
| '\\' 'n'
| '\\' '\"'
| '\\' '\''
| '\\' '\\'
;
Help is greatly appreciated.
Chris
Upvotes: 2
Views: 1231
Reputation: 2365
First of all, thank you for that detailed explanation. That really helps :-) ... All of the "$a.term" and similar stuff is sorted out now and code is generated that actually compiles (I simply hacked in that code wanting to fix the issues with that as soon as something is generated at all). I simply commented out a lot of options and keept on generating until I came to the one fragment that seems to break the build. I turned on that backtrack feature, because some errors I got, suggested that I turn it on.
EDIT: Well I actually refactored the grammar to get rid of the errors without activating backtrack and now my parser is generated really fast and it seems to do it's job nicely. Here comes the current version:
grammar SecurityRulesNew;
options {
language = Java;
output=AST;
ASTLabelType=CommonTree;
/* backtrack = true;*/
}
tokens {
POS;
NEG;
CALL;
}
@header {package de.cware.cweb.services.evaluator.parser;
import de.cware.cweb.services.evaluator.terms.*;}
@lexer::header{package de.cware.cweb.services.evaluator.parser;}
formula returns [Term term]
: a=expression EOF { $term = $a.term; }
;
expression returns [Term term]
: a=boolExpr { $term = $a.term; }
;
boolExpr returns [Term term]
: a=sumExpr (AND! b=boolExpr | OR! c=boolExpr | LT! d=boolExpr | LTEQ! e=boolExpr | GT! f=boolExpr | GTEQ! g=boolExpr | EQ! h=boolExpr | NOTEQ! i=boolExpr)? {
if(b != null) {
$term = new AndTerm($a.term, $b.term);
} else if(c != null) {
$term = new OrTerm($a.term, $c.term);
} else if(d != null) {
$term = new LessThanTerm($a.term, $d.term);
} else if(e != null) {
$term = new LessThanOrEqualTerm($a.term, $e.term);
} else if(f != null) {
$term = new GreaterThanTerm($a.term, $f.term);
} else if(g != null) {
$term = new GreaterThanOrEqualTerm($a.term, $g.term);
} else if(h != null) {
$term = new EqualsTerm($a.term, $h.term);
} else if(i != null) {
$term = new NotEqualsTerm($a.term, $i.term);
} else {
$term = $a.term;
}
}
;
sumExpr returns [Term term]
: a=productExpr (SUB! b=sumExpr | ADD! c=sumExpr)?
{
if(b != null) {
$term = new SubTerm($a.term, $b.term);
} else if(c != null) {
$term = new AddTerm($a.term, $c.term);
} else {
$term = $a.term;
}
}
;
productExpr returns [Term term]
: a=expExpr (DIV! b=productExpr | MULT! c=productExpr)?
{
if(b != null) {
$term = new DivTerm($a.term, $b.term);
} else if(c != null) {
$term = new MultTerm($a.term, $c.term);
} else {
$term = $a.term;
}
}
;
expExpr returns [Term term]
: a=unaryOperation (EXP! b=expExpr)?
{
if(b != null) {
$term = new ExpTerm($a.term, $b.term);
} else {
$term = $a.term;
}
}
;
unaryOperation returns [Term term]
: a=operand { $term = $a.term; }
| NOT! a=operand { $term = new NotTerm($a.term); }
| SUB! a=operand { $term = new NegateTerm($a.term); }
| LPAREN! e=expression RPAREN! { $term = $e.term; }
;
operand returns [Term term]
: l=literal { $term = $l.term; }
| v=VARIABLE { $term = new VariableTerm($v.text); }
| f=functionExpr { $term = $f.term; }
;
functionExpr returns [Term term]
: f=FUNCNAME LPAREN! (a=arguments)? RPAREN! { $term = new CallFunctionTerm($f.text, $a.terms); }
;
arguments returns [List<Term> terms]
: a=expression (COMMA b=arguments)?
{
$terms = new ArrayList<Term>();
$terms.add($a.term);
if(b != null) {
$terms.addAll($b.terms);
}
}
;
literal returns [Term term]
: n=NUMBER { $term = new NumberLiteral(Double.valueOf($n.text)); }
| s=STRING { $term = new StringLiteral($s.text.substring(1, s.getText().length() - 1)); }
| TRUE! { $term = new TrueLiteral(); }
| FALSE! { $term = new FalseLiteral(); }
;
STRING
:
'\"'
( options {greedy=false;}
: ESCAPE_SEQUENCE
| ~'\\'
)*
'\"'
|
'\''
( options {greedy=false;}
: ESCAPE_SEQUENCE
| ~'\\'
)*
'\''
;
WHITESPACE
: (' ' | '\n' | '\t' | '\r')+ {skip();};
TRUE
: ('t'|'T')('r'|'R')('u'|'U')('e'|'E')
;
FALSE
: ('f'|'F')('a'|'A')('l'|'L')('s'|'S')('e'|'E')
;
NOTEQ : '!=';
LTEQ : '<=';
GTEQ : '>=';
AND : '&&';
OR : '||';
NOT : '!';
EQ : '=';
LT : '<';
GT : '>';
EXP : '^';
MULT : '*';
DIV : '/';
ADD : '+';
SUB : '-';
LPAREN : '(';
RPAREN : ')';
COMMA : ',';
PERCENT : '%';
VARIABLE
: '[' ~('[' | ']')+ ']'
;
FUNCNAME
: (LETTER)+
;
NUMBER
: (DIGIT)+ ('.' (DIGIT)+)?
;
fragment
LETTER
: ('a'..'z') | ('A'..'Z')
;
fragment
DIGIT
: ('0'..'9')
;
fragment
ESCAPE_SEQUENCE
: '\\' 't'
| '\\' 'n'
| '\\' '\"'
| '\\' '\''
| '\\' '\\'
;
Thanks again for your explanation ... it got me on the right track :-)
Chris
Upvotes: 0
Reputation: 170178
Because your grammar is so incredibly ambiguous, ANTLR has a problem creating a parser. Apparently ANTLR 3.3+ chokes on it, but ANTLR 3.2 (with less time than 3.3+) produces the following error:
error(10): internal error: org.antlr.tool.Grammar.createLookaheadDFA(Grammar.java:1279): could not even do k=1 for decision 1; reason: timed out (>1000ms)
For a simple expression parser, you really shouldn't use backtrack=true
.
Besides the fact your grammar is ambiguous, much of your embedded code contains errors.
Let's have a look at your formula
rule:
formula returns [Term term]
: a=expression EOF { $term = $a; }
;
Also, the return type of a rule should be explicitly defined. The a
in { $term = a; }
should have a $
in front of it:
formula returns [Term term]
: a=expression EOF { $term = $a; }
;
but then $a
refers to the entire "thing" expression
returns. You then have to "tell" ANTLR you want the Term
this expression
creates. This can be done like this:
formula returns [Term term]
: a=expression EOF { $term = $a.term; }
;
expression returns [Term term]
: a=boolExpr { $term = $a.term; }
;
It looks like you've converted some LR grammar into an ANTLR grammar (note that although ANTLR ends with LR, ANTLR 3.x is an LL parser generator) and without testing in between, you had hoped it should all work: unfortunately, it doesn't. There's too much wrong with it to produce a small working example based on your grammar: I'd have a look at an existing expression parser based on an ANTLR grammar and try again. Have a look at these Q&A's:
Upvotes: 1