Reputation: 1223
I am a newbie to parsing and ANTLR, so apologies in advance if this is a silly question. I am trying to use ANTLR to parse logical formulae (involving AND, OR, NOT). I have the following grammar
grammar LTL;
ltl: ltl_or | ltl_and | ltl_not | Atom;
ltl_and: OPEN_CURLY ltl CLOSE_CURLY And OPEN_CURLY ltl CLOSE_CURLY;
ltl_or: OPEN_CURLY ltl CLOSE_CURLY Or OPEN_CURLY ltl CLOSE_CURLY;
ltl_not: Not OPEN_CURLY ltl CLOSE_CURLY;
Or: 'Or' | 'or';
Atom: (~('{' | '}'))+;
And: 'And' | 'and';
Not: 'Not' | 'not';
OPEN_CURLY: '{';
CLOSE_CURLY: '}';
My test case is {ABCD()}Or{AB CD()}
, which I expect should be parsed as a valid Or
formula with two atoms. And that does happen with the grammar above. However, If I swap the positions of the Or
rule and the Atom
rule, as follows:
Atom: (~('{' | '}'))+;
Or: 'Or' | 'or';
I get the error
line 1:8 no viable alternative at input '{ABCD()}Or'
Could someone please provide some pointers?
Upvotes: 0
Views: 413
Reputation: 6785
Here's a refactored version of your grammar that correctly handles the input you've provided, and takes advantage of other ANTLR idioms that (hopefully) make it easier to read and maintain. (With comments to indicate what some changes were made.)
grammar LTL
;
// *almost* always a good idea to have a start rule that
// consumes everything up to EOF to prevent
// premature termination of parsing input
ltl_start: ltl EOF;
// *IF* you really want to allow a "naked" ATOM
// ltl_start: (ltl | ATOM) EOF;
// since it seems ltls are always wrapped in curlies,
// I've refactored them a bit to simplify also
// used labelled alternatives that are generally easier to read
// Note, you can use literals in your rules that match token rules
// and ANTLR will match them up I find this frequently makes the
// grammar easier to read.
ltl
: ltl OR ltl # ltl_or
| ltl AND ltl # ltl_and
| NOT ltl # ltl_not
| '{' ATOM '}' # ltl_atom
;
// Symbols
OPEN_CURLY: '{';
CLOSE_CURLY: '}';
// keywords
OR: 'Or' | 'or';
AND: 'And' | 'and';
NOT: 'Not' | 'not';
// place ATOM *after* keywords to prevent keywords being misidentified as ATOMs
ATOM: ~[{}]+; // Another way of saying "Atom: (~('{' | '}'))+;"
Though not really necessary, I prefer to name Lexer rules in all-caps so that really stand out from parser rules (individual taste)
For Lexer rules, it's very common to have multiple lexer rules that could match a particular sequence of characters (i.e. ambiguities). Avoiding these entirely would be VERY painful, so ANTLR allows for them and resolves the ambiguities with these two rules:
Upvotes: 2