Carl Wang
Carl Wang

Reputation: 15

Antlr parser ambiguous literal

I have a grammar like this:

grammar a;
rule : cccc direction;
cccc: Char Char Char Char;
direction: Digit Digit Digit 'V' Digit Digit Digit;

Char : [A-Z];
Digit: [0-9];
WS: [ \t\n\r=] ->skip;

I want parser String "AVBC 120V230" ,but i got tips:

line 1:1 extraneous input 'V' expecting Char
line 1:5 missing Char at '1'

what should I do ? thanks.

Upvotes: 1

Views: 60

Answers (1)

Bart Kiers
Bart Kiers

Reputation: 170288

When using a literal token inside a parser rule ('V' in your case), ANTLR treats the complete grammar like this:

grammar a;
rule : cccc direction;
cccc: Char Char Char Char;
direction: Digit Digit Digit T__0 Digit Digit Digit;

T__0 : 'V';
Char : [A-Z];
Digit: [0-9];
WS: [ \t\n\r] ->skip; // you included the `=` here: I assumed it was a typo

And ANTLR will always create a T__0 for the input V. This means V will never be tokenized as a Char token. If you want V to also become a Char, you'll need to move this in a parser rule:

grammar a;
rule : cccc direction;
cccc: ch ch ch ch;
direction: Digit Digit Digit V Digit Digit Digit;
ch : V | Char;

V : 'V';
Char : [A-Z];
Digit: [0-9];
WS: [ \t\n\r] ->skip;

and your input AVBC 120V230 is properly parsed:

enter image description here

Note that I don't know how your language looks like, but letting the lexer tokenize just single bytes/chars and gluing them together in parser rules seems a bit odd. While this is possible, in ANTLR you generally define tokens with a bit more substance. From your example, I'd say defining an Identifier and Direction token more appropriate:

rule : cccc direction;
cccc: Identifier;
direction: Direction;

Identifier : [A-Z]+;
Direction : [0-9] [0-9] [0-9] 'V' [0-9] [0-9] [0-9];
WS: [ \t\n\r] ->skip;

Upvotes: 1

Related Questions