thiagoh
thiagoh

Reputation: 7378

antlr4 grammar errors when parsing

I have the following grammar:

grammar Token;

prog: (expr NL?)+ EOF;

expr: '[' type ']';

type : typeid ':' value;

typeid : 'TXT' | 'ENC' | 'USR';

value: Text | INT;

INT :   '0' | [1-9] [0-9]*;

//WS : [ \t]+;
WS  :   [ \t\n\r]+ -> skip ;
NL:  '\r'? '\n';
Text : ~[\]\[\n\r"]+ ;

and the text I need to parse is something like this below

[TXT:look at me!]
[USR:19700]
[TXT:, can I go there?]
[ENC:124124]
[TXT:this is needed for you to go...]

I need to split this text but I getting some errors when I run grun.bat Token prog -gui -trace -diagnostics

enter   prog, LT(1)=[
enter   expr, LT(1)=[
consume [@0,0:0='[',<3>,1:0] rule expr
enter   type, LT(1)=TXT:look at me!
enter   typeid, LT(1)=TXT:look at me!
line 1:1 mismatched input 'TXT:look at me!' expecting {'TXT', 'ENC', 'USR'}
... much more ...

enter image description here

what is wrong with my grammar? please, help me!

Upvotes: 1

Views: 316

Answers (1)

Bart Kiers
Bart Kiers

Reputation: 170128

You must understand that the tokens are not created based on what the parser is trying to match. The lexer tries to match as much characters as possible (independently from that parser!): your Text token should be defined differently.

You could let the Text rule become a parser rule instead, and match single char tokens like this:

grammar Token;

prog   : expr+ EOF;
expr   : '[' type ']';
type   : typeid ':' value;
typeid : 'TXT' | 'ENC' | 'USR';
value  : text | INT;
text   : CHAR+;

INT  : '0' | [1-9] [0-9]*;
WS   : [ \t\n\r]+ -> skip ;
CHAR : ~[\[\]\r\n];

Upvotes: 1

Related Questions