Reputation: 14393
I am trying to create a very simple antlr grammar file which should parse the following file:
Report (MyReport)
Begin
End
Or without report name:
Report
Begin
End
And here is my grammar file:
grammar RL;
options {
language = Java;
}
report:
REPORT ('(' SPACE* STRING_LITERAL SPACE* ')')?
BEGIN
END
;
REPORT
: 'Report'
;
BEGIN
: 'Begin'
;
END : 'End';
NAME: LETTER (LETTER | DIGIT | '_')*;
STRING_LITERAL : NAME SPACE*;
fragment LETTER: LOWER | UPPER;
fragment LOWER: 'a'..'z';
fragment UPPER: 'A'..'Z';
fragment DIGIT: '0'..'9';
fragment SPACE: ' ' | '\t';
WHITESPACE: SPACE+ { $channel = HIDDEN; };
rule: ;
However when I debug in ANTLRWorks I always get the following error:
root -> report -> MismatchedTokenException(0!=0)
What's wrong in my Grammar file?
thanks, Green
Upvotes: 1
Views: 609
Reputation: 170288
A couple of remarks:
Java
is the default language, so you can omit language=Java;
;SPACE
inside a parser rule, while this SPACE
token is a fragment
: this causes the lexer never to create this token: remove it from your parser rule(s);"Report "
("Report" followed by a single white-space) is being tokenized as a STRING_LITERAL
, not as a REPORT
! ANTLR's lexer consumes characters greedily, only when two or more rules match the same amount of characters, the rule defined first will get precedence. The lexer does not produce tokens that the parser is trying to match (parsing and tokenization are being performed independently!).Try the following instead:
grammar RL;
report
: REPORT ('(' NAME ')')? BEGIN END
;
REPORT : 'Report';
BEGIN : 'Begin';
END : 'End';
NAME : LETTER (LETTER | DIGIT | '_')*;
fragment LETTER : LOWER | UPPER;
fragment LOWER : 'a'..'z';
fragment UPPER : 'A'..'Z';
fragment DIGIT : '0'..'9';
SPACE : (' ' | '\t' | '\r' | '\n')+ { $channel = HIDDEN; };
green wrote:
What if I want to allow "SPACE" inside Report NAME?
I would still skip spaces in the lexer. Accepting spaces between names but ignoring them in other contexts will result in some clunky rules. Instead of accounting for spaces between a report's name, I would do something like this:
report
: REPORT ('(' report_name ')')? BEGIN END
;
report_name
: NAME+
;
resulting in the following parse tree:
for the input:
Report (a name with spaces) Begin End
green wrote:
so is it possible to allow me use reserved words like 'Report' in the name?
Sure, explicitly add them in the report_name
rule:
report_name
: (NAME | REPORT | BEGIN | END)+
;
Upvotes: 3