Fre
Fre

Reputation: 13

ANTLR: token recognized as rule, instead of type

I have the following very simple ANTLR grammer:

SPACE                    : [ ]+ -> skip;
NUMBER                   : ('0'..'9')+;

event                    : '1' '|' identifier EOF;
identifier               : NUMBER;

The idea is to parse all inputs of format 1 | <number>.

This works fine for for example the input 1 | 50. But it fails for 1 | 1. I believe I understand what is going on: the second 1 is recognized as the rule event and not the rule identifier, but I am not sure how to fix this.

How do I proceed here?

Upvotes: 1

Views: 198

Answers (1)

Bart Kiers
Bart Kiers

Reputation: 170158

When you add the literal '1' in a parser rule, ANTLR will implicitly create a lexer rule for this. So the rules:

event                    : '1' '|' identifier EOF;
NUMBER                   : ('0'..'9')+;

are really this:

event                    : T_0 T_1 identifier EOF;
T_0                      : '1';
T_1                      : '|';
NUMBER                   : ('0'..'9')+;

And ANTLR's lexer will always create tokens in the following way:

  1. try to match as much characters as possible for each lexer rule
  2. whenever there are 2 or more lexer rules that match the same characters, let the one defined first "win"

So, for the input 1, the token T_0 will always be created (point 2 applies). And for the input 11 the token NUMBER will always be created (point 1 applies).

In other words: the input 1 will never become a NUMBER token. If you want that, do something like this:

SPACE                    : [ ]+ -> skip;
ONE                      : '1';
NUMBER                   : ('0'..'9')+;

event                    : ONE '|' identifier EOF;
identifier               : number;
number                   : ONE | NUMBER;

Upvotes: 2

Related Questions