Reputation: 9794
I am using the following simple grammar to get an understanding of ANTLR.
grammar Example;
options {
language=Java;
}
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
INT : '0'..'9'+
;
PLUS : '+';
ADDNUM :
INT PLUS INT;
prog : ADDNUM;
When I try running the grammar in ANTLRWorks for the input 1+2
, I get the following error in the console:
[16:54:08] Interpreting... [16:54:08] problem matching token at 2:0
NoViableAltException(' '@[1:1: Tokens : ( ID | INT | PLUS | ADDNUM);])
Can anyone please help me understand where I am going wrong.
Upvotes: 0
Views: 161
Reputation: 170148
You probably didn't indicate prog
as the starting rule in ANTLRWorks. If you do, it all goes okay.
But you really shouldn't create a lexer rule that matches an expression like you do in ADDNUM
: this should be a parser rule:
grammar Example;
prog : addExpr EOF;
addExpr : INT PLUS INT;
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
INT : '0'..'9'+;
PLUS : '+';
There are no strict rules when to use parser-, lexer- or fragment rules, but here's what they're usually used for:
A lexer rule is usually the smallest part of a language (a string, a numbers, an identifier, a comment, etc.). Trying to create a lexer rule from input like 1+2
causes problems because:
1 + 2
.The expression 1+2
are three tokens: INT
, PLUS
and another INT
.
A fragment rule is used when you don't want this rule to ever because a "real" token. For example, take the following lexer rules:
ID : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*
FLOAT : '0'..'9'+ '.' '0'..'9'+;
INT : '0'..'9'+;
In the rules above, you're using '0'..'9'
four times, so you could place that in a separate rule
ID : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | DIGIT)*
FLOAT : DIGIT+ '.' DIGIT+;
INT : DIGIT+;
DIGIT : '0'..'9';
But you don't want to ever create a DIGIT
token: you only want the DIGIT
to be used by other lexer rules. In that case, you can create a fragment
rule:
ID : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | DIGIT)*
FLOAT : DIGIT+ '.' DIGIT+;
INT : DIGIT+;
fragment DIGIT : '0'..'9';
This will make sure there will never be a DIGIT
token: and can therefor never use this in your parser rule(s)!
Parser rules glue the tokens together: they make sure the language is syntactic valid (a.k.a. parsing). To emphasize, parser rules can use other parser rules or lexer rules, but not fragment rules.
Also see: ANTLR: Is there a simple example?
Upvotes: 1