Reputation: 15265
I'm building an ANTLR4 grammar to parse strings from a data source - similar, if not pretty much the same as StringTemplate, except I don't like that syntax so I'm writing my own (also just for fun and learning, as this is my first experience w/ ANTLR). My grammar currently looks like this (this is simplified from what I actually have, but I've verified that it is a "good example" and exhibits the same problem I'm asking about):
grammar Combined1;
file:
.*? (repToken .*?)+
| .*?
;
foreach: '@foreach' WS* '(' WS* repvar WS* ')' WS* '{' content=file '}' ;
with: '@with' WS* '(' WS* repvar WS* ')' WS* '{' content=file '}' ;
// withx: '@withx' WS* '(' WS* repvar WS* ')' WS* '{' content=file '}' ;
repvar: '@' (
'$'
| '(' nestedIdentifier ')'
| nestedIdentifier
) ;
repToken:
foreach
| with
// | withx
| repvar
;
nestedIdentifier: Identifier ('.' Identifier)* ;
Identifier: [A-Za-z_] [A-Za-z0-9_]* ;
WS: [ \t\r\n] ;
Other: ( . ) ;
This grammar works just fine, allowing me to perform replacements such as:
string template = "Test: @foreach(@list){@$}";
Process(template, new { list = new [] { "A", "B", "C" } });
and the result would be:
Test: ABC
(The mechanics of how I process the tree to get this result are relatively simple but not relevant to the question, so I'm not providing that code.)
My question is this... if I include (uncomment) the "withx" rule right below the with:
rule, and I forget to include (uncomment) the withx
to the alternatives in repToken
then my example above breaks, even though it has absolutely nothing to do with withx
. Once I add withx
as an alternative to repToken
then my example works again. Why??
Here's what I know:
withx
is included or not, my lexer correctly
returns 12 tokens: Test
, :
, ' '
, @foreach
, (
, @
, list
,
)
, {
, @
, item
. This isn't surprising as I've only added a
parser rule, and not touched the lexer tokens (aside from adding the
one implicit token '@withx'). withx
rule, my
parser correctly groups all the tokens after @foreach as children of
the ForeachContext, resulting in a FileContext with 4 children (3
TerminalNodeImpl and a RepTokenContext). withx
rule, my parser for some reason doesn't recognize the rest of the
tokens as belonging to ForeachContext, resulting in a FileContext
with 10 children, none of which is a ForeachContext, but which has
all TerminalNodeImpl with 2 RepTokenContext corresponding to @list
and @$.I'm completely baffled why adding a parser rule that doesn't have anything to do with my input would cause my parser to fail. Help!?
EDIT 3/17/2014: JavaMan asked for a parse tree in each scenario to clarify the description above. I don't know how to generate the parse tree graphic that he did, but here's two screenshots from Visual Studio debugger illustrating the difference... Note that in these images I use longer names - specifically, ReplacementTokenContext is for repToken.
The first one is when I DO include withx
in the alternative list (note that the tree is essentially FileContext -> ReplacementTokenContext (node index 3) -> ForeachContext):
And the second is when I DO NOT include withx
in the alternative list (note that the tree is essentially FileContext -> TerminalNodeImpl "@foreach" (node index 3):
Upvotes: 1
Views: 564
Reputation: 5034
With your whole grammar plus the withx
rule and the 2 lines of input, I am able to obtain this parse tree node repToken
grouping the @foreach input text under a foreach
node:
It looks like a correct parsing to me. Is this what your want? Could it be a problem with your visitor code? Did you get the same parse tree? It would be better if you could post your parse tree here.
By the way, what about sending all whitespaces to a hidden channel and delete all the WS tokens from the parser rules?
EDIT:
I'm using ANTLR4 V4.1 with Java target only so I cannot be sure if it is a bug with the C# target or v4.2. But both grammars give me the same parse tree in Java. There is a tool called TestRig (at least in Java target) that can generate the parse tree in either GUI or ASCII form:
java org.antlr.v4.runtime.misc.TestRig Combine1 file -tree in.cpp > treeres.txt
By running the above command using the 2 versions of grammar you mentioned and the same input file, I got the same ASCII representation of the parse tree:
(file string template = " Test : (repToken (foreach @foreach ( (repvar @ (nestedIdentifier list)) ) { (file (repToken (repvar @ $))) })) " ; \r \n Process ( template , new { list = new [ ] { " A " , " B " , " C " } } ) ;)
The graphical output is too big so I don't include them here. So at least in Java, the same parse tree is generated with or without the withx
rule.
I suggest you to double check with the TestRig tool or try verifying with the Java target.
Upvotes: 1