Reputation: 5157
I have the following grammar:
grammar Hello;
prog: stat+ EOF;
stat: DELIMITER_OPEN expr DELIMITER_CLOSE;
expr: NOTES COMMA value=VAR_VALUE #delim_body;
VAR_VALUE: ANBang*;
NOTES: WS* 'notes' WS*;
COMMA: ',';
DELIMITER_OPEN: '<<!';
DELIMITER_CLOSE: '!>>';
fragment ANBang: AlphaNum | Bang;
fragment AlphaNum: [a-zA-Z0-9];
fragment Bang: '!';
WS : [ \t\r\n]+ -> skip ;
Parsing the following works:
<<! notes, Test !>>
and the variable value is "Test", however, the parser fails when I eliminate the space between the DELIMITER_OPEN and NOTES:
<<!notes, Test !>>
line 1:3 mismatched input 'notes' expecting NOTES
Upvotes: 0
Views: 1839
Reputation: 51330
This is yet another case of badly ordered lexer rules.
When the lexer scans for the next token, it first tries to find the rule which will match the longest token. If several rules match, it will disambiguate by choosing the first one in definition order.
<<! notes, Test !>>
will be tokenized as such:
DELIMITER_OPEN
NOTES
COMMA
VAR_VALUE
WS
DELIMITER_CLOSE
This is because the NOTES
rule can match the following:
<<! notes, Test !>>
\____/
Which includes the whitespace. If you remove it:
<<!notes, Test !>>
Then both the NOTES
and VAR_VALUE
rules can match the text notes
, and, VAR_VALUE
is defined first in the grammar, so it gets precedence. The tokenization is:
DELIMITER_OPEN
VAR_VALUE
COMMA
VAR_VALUE
WS
DELIMITER_CLOSE
and it doesn't match your expr
rule.
Change your rules like this to fix the problem:
NOTES: 'notes';
VAR_VALUE: ANBang+;
Adding WS*
to other rules doesn't make much sense, since WS
is skipped. And declaring a token as having a possible zero width *
is also meaningless, so use +
instead. Finally, reorder the rules so that the most specific ones match fist.
This way, notes
becomes a keyword in your grammar. If you don't want it to be a keyword, remove the NOTES
rule altogether, and use the VAR_VALUE
rule with a predicate. Alternatively, you could use lexer modes.
Upvotes: 2