Reputation: 28662
I have written following grammar for parsing DateTime from a given string
datetime: INT SEPARATOR month SEPARATOR INT4
| INT SEPARATOR month SEPARATOR INT4;
month:
JAN
| FEB
| MAR
| APR
| MAY
| JUN
| JUL
| AUG
| SEP
| OCT
| NOV
| DEC;
STRING: [a-zA-Z][a-zA-Z]+;
NUMBER: [0-9]+;
INT4: DIGIT DIGIT DIGIT DIGIT;
INT: DIGIT+;
DIGIT: ['0'-'9'];
DQUOTE : '"';
JAN: [Jj][Aa][Nn];
FEB: [Ff][Ee][Bb];
MAR: [Mm][Aa][Rr];
APR: [Aa][Pp][Rr];
MAY: [Mm][Aa][Yy];
JUN: [Jj][Uu][Nn];
JUL: [Jj][Uu][Ll];
AUG: [Aa][Uu][Gg];
SEP: [Ss][Ee][Pp];
OCT: [Oo][Cc][Tt];
NOV: [Nn][Oo][Vv];
DEC: [Dd][Ee][Cc];
SEPARATOR: '-';
WS: [ \n\t\r]+ -> skip;
When I am trying to match the following string
new teatime at 23-SEP-2013 for Santosh Singh and 3 guests
I am getting the following error in ANTLR output
line 1:15 mismatched input '23' expecting INT
Upvotes: 1
Views: 93
Reputation: 170227
First, the DIGIT: ['0'-'9'];
rule is incorrect, it should be: DIGIT: [0-9];
Whenever you get unexpected results, start by dumping the tokens your lexer is creating to see if they are the tokens you expect your parser to work with. For your grammar, that would be the following tokens:
STRING `new`
STRING `teatime`
STRING `at`
NUMBER `23`
SEPARATOR `-`
STRING `SEP`
SEPARATOR `-`
NUMBER `2013`
STRING `for`
STRING `Santosh`
STRING `Singh`
STRING `and`
NUMBER `3`
STRING `guests`
As you can see, there are a couple of things going wrong:
INT
tokens are ever created, while your parser expects such tokens. This is because of the following rules (and their order):NUMBER : [0-9]+;
INT4 : DIGIT DIGIT DIGIT DIGIT;
INT : DIGIT+;
DIGIT : [0-9];
For the input 3
, the rules NUMBER
, INT
and DIGIT
could be matched. Whenever ANTLR's lexer can construct more than 1 token, the token (lexer rule) defined first "wins". So, a single digit token, or any amount of digit token, will always become a NUMBER
token. INT4
, INT
and DIGIT
will never be created, no matter if the parser is trying to match any of these tokens. The lexer works independently from the parser. Nothing you can do about that.
STRING
tokens. The same as with the issue above: "SEP"
can be matched by the STRING
rule and by the SEP
rule, but since STRING
is defined before SEP
, the one defined first "wins".Reordering the grammar a bit like this:
grammar T;
parse
: (datetime | text)+ EOF
;
text
: STRING
| month
| INT
;
datetime
: INT SEPARATOR month SEPARATOR INT4
| INT SEPARATOR month SEPARATOR INT4
;
month
: JAN
| FEB
| MAR
| APR
| MAY
| JUN
| JUL
| AUG
| SEP
| OCT
| NOV
| DEC
;
JAN : [Jj][Aa][Nn];
FEB : [Ff][Ee][Bb];
MAR : [Mm][Aa][Rr];
APR : [Aa][Pp][Rr];
MAY : [Mm][Aa][Yy];
JUN : [Jj][Uu][Nn];
JUL : [Jj][Uu][Ll];
AUG : [Aa][Uu][Gg];
SEP : [Ss][Ee][Pp];
OCT : [Oo][Cc][Tt];
NOV : [Nn][Oo][Vv];
DEC : [Dd][Ee][Cc];
STRING : [a-zA-Z][a-zA-Z]+;
INT4 : DIGIT DIGIT DIGIT DIGIT;
INT : DIGIT+;
DQUOTE : '"';
SEPARATOR : '-';
WS: [ \n\t\r]+ -> skip;
fragment DIGIT : [0-9];
should match your input correctly.
Upvotes: 2