Reputation: 677
I want to create a parser using antlr for the following string:
"1 AND (2 OR (3 AND 4)) AND 5" -> so i want to have AND and OR operations which should result in a tree after parsing was successful. this should result in the following tree:
AND
- 1
- OR
- 2
- AND
-3
-4
- 5
i also want to avoid unclear inputs like "1 AND 2 OR 3" as it is not clear how to construct the tree from that. And it also seems like the parser "accepts" input with trailing sings such as "1 AND 2asdf".
what i have so far is (not working as expected):
grammar code;
options {
language=CSharp3;
output=AST;
ASTLabelType=CommonTree;
//backtrack=true;
}
tokens {
ROOT;
}
@rulecatch {
catch {
throw;
}
}
@parser::namespace { Web.DealerNet.Areas.QueryBuilder.Parser }
@lexer::namespace { Web.DealerNet.Areas.QueryBuilder.Parser }
@lexer::members {
public override void ReportError(RecognitionException e) {
throw e;
}
}
public parse : exp EOF -> ^(ROOT exp);
exp
: atom
( And^ atom (And! atom)*
| Or^ atom (Or! atom)*
)?
;
atom
: Number
| '(' exp ')' -> exp
;
Number
: ('0'..'9')+
;
And
: 'AND' | 'and'
;
Or
: 'OR' | 'or'
;
WS : (' '|'\t'|'\f'|'\n'|'\r')+{ Skip(); };
Hope someone of you guys can help me get on the right track!
edit: and how can i archieve "1 AND 2 AND 3" to result in
AND
1
2
3
instead of
AND
AND
1
2
3
EDIT:
thanks for the great solution, it works like a charm except for one thing: when i call the parse() method on the following term "1 AND (2 OR (1 AND 3) AND 4" (closing bracket missing) the parser still accepts the input as valid.
this is my code so far: grammar code;
options {
language=CSharp3;
output=AST;
ASTLabelType=CommonTree;
}
tokens {
ROOT;
}
@rulecatch {
catch {
throw;
}
}
@lexer::members {
public override void ReportError(RecognitionException e) {
throw e;
}
}
public parse
: exp -> ^(ROOT exp)
;
exp
: atom
( And^ atom (And! atom)*
| Or^ atom (Or! atom)*
)?
;
atom
: Number
| '(' exp ')' -> exp
;
Number
: ('0'..'9')+
;
And
: 'AND' | 'and'
;
Or
: 'OR' | 'or'
;
WS : (' '|'\t'|'\f'|'\n'|'\r')+{ Skip(); };
edit2: i just found another problem with my grammar: when i have input like "1 AND 2 OR 3" the grammar gets parsed just fine, but it should fail because either the "1 AND 2" needs to be inside brackets or the "2 OR 3" part. i dont understand why the parser runs through as in my opinion this grammar should really cover that case. is there any sort of online-testing-environment or so to find the problem? (i tried antlrWorks but the errors given there dont lead me anywhere...)
edit3: updated the code to represent the new grammar like suggested.
i still have the same problem that the following grammar:
public parse : exp EOF -> ^(ROOT exp);
doesnt parse to the end.. the generated c# sources seem to just ignore the EOF... can you provide any further guidance on how i could resolve the issue?
edit4 i still have the same problem that the following grammar:
public parse : exp EOF -> ^(ROOT exp);
doesnt parse to the end.. the generated c# sources seem to just ignore the EOF... can you provide any further guidance on how i could resolve the issue?
the problem seems to be in this part of the code:
EOF2=(IToken)Match(input,EOF,Follow._EOF_in_parse97);
stream_EOF.Add(EOF2);
When i add the following code (just a hack) it works...
if (EOF2.Text == "<missing EOF>") {
throw new Exception(EOF2.Text);
}
can i change anything so the parser gets generated correclty from the start?
Upvotes: 0
Views: 687
Reputation: 99939
This rule will disallow expressions containing both AND
and OR
without parentheses. It will also construct the parse tree you described by making the first AND
or OR
token the root of the AST, and then hiding the rest of the AND
or OR
tokens from the same expression.
exp
: atom
( 'AND'^ atom ('AND'! atom)*
| 'OR'^ atom ('OR'! atom)*
)?
;
Edit: The second problem is unrelated to this. If you don't instruct ANTLR to consume all input by including an explicit EOF
symbol in one of your parser rules, then it is allowed to consume only a portion of the input in an attempt to successfully match something.
The original parse
rule says "match some input as exp
". The following modification to the parse
rule says "match the entire input as exp
".
public parse : exp EOF -> ^(ROOT exp);
Upvotes: 1