Michael Bray
Michael Bray

Reputation: 15265

Irrelevant rule breaks ANTLR4 grammar

I'm building an ANTLR4 grammar to parse strings from a data source - similar, if not pretty much the same as StringTemplate, except I don't like that syntax so I'm writing my own (also just for fun and learning, as this is my first experience w/ ANTLR). My grammar currently looks like this (this is simplified from what I actually have, but I've verified that it is a "good example" and exhibits the same problem I'm asking about):

grammar Combined1;

file: 
    .*? (repToken .*?)+
    | .*?
    ;

foreach: '@foreach' WS* '(' WS* repvar WS* ')' WS* '{' content=file '}' ;
with: '@with' WS* '(' WS* repvar WS* ')' WS* '{' content=file '}' ;
// withx: '@withx' WS* '(' WS* repvar WS* ')' WS* '{' content=file '}' ;

repvar: '@' (
    '$'
    | '(' nestedIdentifier ')'
    | nestedIdentifier 
    ) ;

repToken:
    foreach
    | with
    // | withx
    | repvar
    ;

nestedIdentifier: Identifier ('.' Identifier)* ;
Identifier: [A-Za-z_] [A-Za-z0-9_]* ;
WS: [ \t\r\n] ;
Other: ( . ) ;

This grammar works just fine, allowing me to perform replacements such as:

string template = "Test: @foreach(@list){@$}";
Process(template, new { list = new [] { "A", "B", "C" } });

and the result would be:

Test: ABC

(The mechanics of how I process the tree to get this result are relatively simple but not relevant to the question, so I'm not providing that code.)

My question is this... if I include (uncomment) the "withx" rule right below the with: rule, and I forget to include (uncomment) the withx to the alternatives in repToken then my example above breaks, even though it has absolutely nothing to do with withx. Once I add withx as an alternative to repToken then my example works again. Why??

Here's what I know:

I'm completely baffled why adding a parser rule that doesn't have anything to do with my input would cause my parser to fail. Help!?

EDIT 3/17/2014: JavaMan asked for a parse tree in each scenario to clarify the description above. I don't know how to generate the parse tree graphic that he did, but here's two screenshots from Visual Studio debugger illustrating the difference... Note that in these images I use longer names - specifically, ReplacementTokenContext is for repToken.

The first one is when I DO include withx in the alternative list (note that the tree is essentially FileContext -> ReplacementTokenContext (node index 3) -> ForeachContext): Visual Studio Watch when I include withx

And the second is when I DO NOT include withx in the alternative list (note that the tree is essentially FileContext -> TerminalNodeImpl "@foreach" (node index 3): Visual Studio Watch when I DO NOT include withx

Upvotes: 1

Views: 564

Answers (1)

JavaMan
JavaMan

Reputation: 5034

With your whole grammar plus the withx rule and the 2 lines of input, I am able to obtain this parse tree node repToken grouping the @foreach input text under a foreach node: enter image description here

It looks like a correct parsing to me. Is this what your want? Could it be a problem with your visitor code? Did you get the same parse tree? It would be better if you could post your parse tree here.

By the way, what about sending all whitespaces to a hidden channel and delete all the WS tokens from the parser rules?

EDIT:

I'm using ANTLR4 V4.1 with Java target only so I cannot be sure if it is a bug with the C# target or v4.2. But both grammars give me the same parse tree in Java. There is a tool called TestRig (at least in Java target) that can generate the parse tree in either GUI or ASCII form:

java org.antlr.v4.runtime.misc.TestRig Combine1 file -tree in.cpp > treeres.txt

By running the above command using the 2 versions of grammar you mentioned and the same input file, I got the same ASCII representation of the parse tree:

(file string   template   =   " Test :   (repToken (foreach @foreach ( (repvar @ (nestedIdentifier list)) ) { (file (repToken (repvar @ $))) })) " ; \r \n Process ( template ,   new   {   list   =   new   [ ]   {   " A " ,   " B " ,   " C "   }   } ) ;)

The graphical output is too big so I don't include them here. So at least in Java, the same parse tree is generated with or without the withx rule.

I suggest you to double check with the TestRig tool or try verifying with the Java target.

Upvotes: 1

Related Questions