Antlr4 grammar - trouble identifying grammar

Question

I'm working with Antlr4 to parse a boolean-like DSL.

Here is my grammar:

grammar filter;

filter: overall EOF;

overall
    : LPAREN overall RPAREN 
    | category
    ;

category
    : expression # InferenceCategory
    | category AND category # CategoryAndBlock
    | label COLON expression # CategoryBlock
    | LPAREN category RPAREN # NestedCategory
    ;

expression
    : NOT expression            # NotExpr
    | expression AND expression  # AndExpr
    | expression OR expression   # OrExpr
    | atom                      # AtomExpr
    | LPAREN expression RPAREN  # NestedExpression
    ;

label
    : ALPHANUM
    ;

atom 
    : ALPHANUM
    ;

Here is an example input string to parse:

(cat1:(1 OR 2) AND cat2:( 4 ))

This grammar works fine with this input; it produces the following parse tree which perfectly suits my needs:

However, there is weird case of the DSL, where the "cat1" label is implicit when no other category is specified. This is what the InferenceCategory tag catches, where this expression will be handled as a category in my code later.

For example, with

((1 OR 2) AND cat2:( 4 ))

I get (as expected):

However, in the following instance:

cat2:( 4 ) AND (1 OR 2)

I get:

Notice that the second block is not identified as a InferenceCategory and but instead as a normal expression, under the first category. This is because there the grammar parses ( 4 ) following cat2: as a normal expression, and everything past that is parsed as a normal expression.

Is there any way to fix this? I've tried:

label COLON expression (AND category)* # CategoryBlock (which doesn't work)

and

category AND category AND category (which "works", but is extremely hacky and only works in the specific case that I have exactly three categories. Any more, and it breaks again.)

TomServo · Accepted Answer

The "alternative labels" like NOT expression # NotExpr do not make a difference in your parse tree. They are semantic-only. They will cause the code generation process to create specific signatures that you can override in your Visitor or Listener.

The rationale behind this is, for example, instead of getting just one Visitor override for expression, you'll get several, one for each alternative label. That way, you don't have to examine expression and determine what type it is before acting on it. Instead, you'll get an override for # OrExpr for example, and as soon as you're in that override code, you know you're dealing with an OR, with an expression on each side of the OR token.

The parse tree is useful, but much of the semantics only become apparent when you code up your Listener or Visitor.

Antlr4 grammar - trouble identifying grammar

Answers (1)

Related Questions