name_masked
name_masked

Reputation: 9794

Cannot intrepret ANTLRWorks output

I am using the following simple grammar to get an understanding of ANTLR.

grammar Example;
options {
language=Java;
}

ID  : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;

INT : '0'..'9'+
    ;
PLUS    :   '+';


ADDNUM  :   
    INT PLUS INT;

prog    :    ADDNUM;

When I try running the grammar in ANTLRWorks for the input 1+2, I get the following error in the console:

[16:54:08] Interpreting... [16:54:08] problem matching token at 2:0
NoViableAltException(' '@[1:1: Tokens : ( ID | INT | PLUS | ADDNUM);])

Can anyone please help me understand where I am going wrong.

Upvotes: 0

Views: 161

Answers (1)

Bart Kiers
Bart Kiers

Reputation: 170148

You probably didn't indicate prog as the starting rule in ANTLRWorks. If you do, it all goes okay.

But you really shouldn't create a lexer rule that matches an expression like you do in ADDNUM: this should be a parser rule:

grammar Example;

prog    : addExpr EOF;
addExpr : INT PLUS INT;
ID      : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
INT     : '0'..'9'+;
PLUS    : '+';

ANTLR rules

There are no strict rules when to use parser-, lexer- or fragment rules, but here's what they're usually used for:

lexer rules

A lexer rule is usually the smallest part of a language (a string, a numbers, an identifier, a comment, etc.). Trying to create a lexer rule from input like 1+2 causes problems because:

  • if you ever want to extract something meaningful from that token (evaluate it, for example), you need to split the contents of that token because after creating 1 token from it, the text from the entire expression is "glued" together;
  • you run into problems when there are white-space in between it: 1 +   2.

The expression 1+2 are three tokens: INT, PLUS and another INT.

fragment rules

A fragment rule is used when you don't want this rule to ever because a "real" token. For example, take the following lexer rules:

ID    : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*
FLOAT : '0'..'9'+ '.' '0'..'9'+; 
INT   : '0'..'9'+;

In the rules above, you're using '0'..'9' four times, so you could place that in a separate rule

ID    : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | DIGIT)*
FLOAT : DIGIT+ '.' DIGIT+; 
INT   : DIGIT+;
DIGIT : '0'..'9';

But you don't want to ever create a DIGIT token: you only want the DIGIT to be used by other lexer rules. In that case, you can create a fragment rule:

ID    : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | DIGIT)*
FLOAT : DIGIT+ '.' DIGIT+; 
INT   : DIGIT+;
fragment DIGIT : '0'..'9';

This will make sure there will never be a DIGIT token: and can therefor never use this in your parser rule(s)!

parser rules

Parser rules glue the tokens together: they make sure the language is syntactic valid (a.k.a. parsing). To emphasize, parser rules can use other parser rules or lexer rules, but not fragment rules.


Also see: ANTLR: Is there a simple example?

Upvotes: 1

Related Questions