SPMP
SPMP

Reputation: 1223

ANTLR syntax error depending on where a rule is defined

I am a newbie to parsing and ANTLR, so apologies in advance if this is a silly question. I am trying to use ANTLR to parse logical formulae (involving AND, OR, NOT). I have the following grammar

grammar LTL;

ltl: ltl_or | ltl_and | ltl_not | Atom;

ltl_and: OPEN_CURLY ltl CLOSE_CURLY And OPEN_CURLY ltl CLOSE_CURLY;
ltl_or: OPEN_CURLY ltl CLOSE_CURLY Or OPEN_CURLY ltl CLOSE_CURLY;
ltl_not: Not OPEN_CURLY ltl CLOSE_CURLY;

Or: 'Or' | 'or';
Atom: (~('{' | '}'))+;
And: 'And' | 'and';
Not: 'Not' | 'not';
OPEN_CURLY: '{';
CLOSE_CURLY: '}';

My test case is {ABCD()}Or{AB CD()}, which I expect should be parsed as a valid Or formula with two atoms. And that does happen with the grammar above. However, If I swap the positions of the Or rule and the Atom rule, as follows:

Atom: (~('{' | '}'))+;
Or: 'Or' | 'or';

I get the error

line 1:8 no viable alternative at input '{ABCD()}Or'

Could someone please provide some pointers?

Upvotes: 0

Views: 413

Answers (1)

Mike Cargal
Mike Cargal

Reputation: 6785

Here's a refactored version of your grammar that correctly handles the input you've provided, and takes advantage of other ANTLR idioms that (hopefully) make it easier to read and maintain. (With comments to indicate what some changes were made.)

grammar LTL
    ;

// *almost* always a good idea to have a start rule that 
// consumes everything up to EOF to prevent
// premature termination of parsing input
ltl_start: ltl EOF;

// *IF* you really want to allow a "naked" ATOM 

// ltl_start: (ltl | ATOM) EOF;

// since it seems ltls are always wrapped in curlies, 
// I've refactored them a bit to simplify also
// used labelled alternatives that are generally easier to read 
// Note, you can use literals in your rules that match token rules 
// and ANTLR will match them up I find this frequently makes the
// grammar easier to read.
ltl
    : ltl OR ltl   # ltl_or
    | ltl AND ltl  # ltl_and
    | NOT ltl      # ltl_not
    | '{' ATOM '}' # ltl_atom
    ;

// Symbols
OPEN_CURLY:  '{';
CLOSE_CURLY: '}';

// keywords
OR:  'Or' | 'or';
AND: 'And' | 'and';
NOT: 'Not' | 'not';

// place ATOM *after* keywords to prevent keywords being misidentified as ATOMs
ATOM: ~[{}]+; // Another way of saying "Atom: (~('{' | '}'))+;"

Though not really necessary, I prefer to name Lexer rules in all-caps so that really stand out from parser rules (individual taste)

For Lexer rules, it's very common to have multiple lexer rules that could match a particular sequence of characters (i.e. ambiguities). Avoiding these entirely would be VERY painful, so ANTLR allows for them and resolves the ambiguities with these two rules:

  • If more than one Lexer rule matches input, then the one that matches the longest sequence of characters is used.
  • If multiple rules match the same length sequence of characters, then the Lexer rule that appears first in the grammar is used.

Upvotes: 2

Related Questions