Reputation: 7378
I have the following grammar:
grammar Token;
prog: (expr NL?)+ EOF;
expr: '[' type ']';
type : typeid ':' value;
typeid : 'TXT' | 'ENC' | 'USR';
value: Text | INT;
INT : '0' | [1-9] [0-9]*;
//WS : [ \t]+;
WS : [ \t\n\r]+ -> skip ;
NL: '\r'? '\n';
Text : ~[\]\[\n\r"]+ ;
and the text I need to parse is something like this below
[TXT:look at me!]
[USR:19700]
[TXT:, can I go there?]
[ENC:124124]
[TXT:this is needed for you to go...]
I need to split this text but I getting some errors when I run grun.bat Token prog -gui -trace -diagnostics
enter prog, LT(1)=[
enter expr, LT(1)=[
consume [@0,0:0='[',<3>,1:0] rule expr
enter type, LT(1)=TXT:look at me!
enter typeid, LT(1)=TXT:look at me!
line 1:1 mismatched input 'TXT:look at me!' expecting {'TXT', 'ENC', 'USR'}
... much more ...
what is wrong with my grammar? please, help me!
Upvotes: 1
Views: 316
Reputation: 170128
You must understand that the tokens are not created based on what the parser is trying to match. The lexer tries to match as much characters as possible (independently from that parser!): your Text
token should be defined differently.
You could let the Text
rule become a parser rule instead, and match single char tokens like this:
grammar Token;
prog : expr+ EOF;
expr : '[' type ']';
type : typeid ':' value;
typeid : 'TXT' | 'ENC' | 'USR';
value : text | INT;
text : CHAR+;
INT : '0' | [1-9] [0-9]*;
WS : [ \t\n\r]+ -> skip ;
CHAR : ~[\[\]\r\n];
Upvotes: 1