Gelin Luo
Gelin Luo

Reputation: 14393

What's wrong with my simple antlr grammar?

I am trying to create a very simple antlr grammar file which should parse the following file:

Report (MyReport)
Begin
End

Or without report name:

Report
Begin
End

And here is my grammar file:

grammar RL;

options {
  language = Java;
}

report:
  REPORT ('(' SPACE* STRING_LITERAL SPACE* ')')?
  BEGIN
  END
  ;

REPORT
    :   'Report'
    ;     

BEGIN
    :   'Begin'
    ;

END :   'End';

NAME:   LETTER (LETTER | DIGIT | '_')*;

STRING_LITERAL :    NAME SPACE*;

fragment LETTER: LOWER | UPPER;

fragment LOWER: 'a'..'z';

fragment UPPER: 'A'..'Z';

fragment DIGIT: '0'..'9';

fragment SPACE: ' ' | '\t';

WHITESPACE: SPACE+ { $channel = HIDDEN; };

rule: ;

However when I debug in ANTLRWorks I always get the following error:

 root -> report -> MismatchedTokenException(0!=0)

What's wrong in my Grammar file?

thanks, Green

Upvotes: 1

Views: 609

Answers (1)

Bart Kiers
Bart Kiers

Reputation: 170288

A couple of remarks:

  • Java is the default language, so you can omit language=Java;;
  • you're using SPACE inside a parser rule, while this SPACE token is a fragment: this causes the lexer never to create this token: remove it from your parser rule(s);
  • the input "Report " ("Report" followed by a single white-space) is being tokenized as a STRING_LITERAL, not as a REPORT! ANTLR's lexer consumes characters greedily, only when two or more rules match the same amount of characters, the rule defined first will get precedence. The lexer does not produce tokens that the parser is trying to match (parsing and tokenization are being performed independently!).

Try the following instead:

grammar RL;

report
 : REPORT ('(' NAME ')')? BEGIN END
 ;

REPORT : 'Report';     
BEGIN  : 'Begin';
END    : 'End';
NAME   : LETTER (LETTER | DIGIT | '_')*;

fragment LETTER : LOWER | UPPER;
fragment LOWER  : 'a'..'z';
fragment UPPER  : 'A'..'Z';
fragment DIGIT  : '0'..'9';

SPACE  : (' ' | '\t' | '\r' | '\n')+ { $channel = HIDDEN; };

green wrote:

What if I want to allow "SPACE" inside Report NAME?

I would still skip spaces in the lexer. Accepting spaces between names but ignoring them in other contexts will result in some clunky rules. Instead of accounting for spaces between a report's name, I would do something like this:

report
 : REPORT ('(' report_name ')')? BEGIN END
 ;

report_name
 : NAME+
 ;

resulting in the following parse tree:

enter image description here

for the input:

Report (a name with spaces)
Begin
End

green wrote:

so is it possible to allow me use reserved words like 'Report' in the name?

Sure, explicitly add them in the report_name rule:

report_name
 : (NAME | REPORT | BEGIN | END)+
 ;

Upvotes: 3

Related Questions