Reputation: 2064
I have some data required to be parsed. I am using ANTLR4 tool to auto generate java parsers and lexers, that I can use to form a structured data from the input data given below Grammar:
grammar SUBDATA;
subdata:
data+;
data:
array;
array:
'[' obj (',' obj)* ']';
intarray:
'[' number (',' number)* ']';
number:
INT;
obj:
'{' pair (',' pair)* '}';
pair:
key '=' value;
key:
WORD;
value:
INT | WORD | intarray;
WORD:
[A-Za-z0-9]+;
INT:
[0-9]+;
WS:
[ \t\n\r]+ -> skip;
Test Input Data:
[
{OmedaDemographicType=1, OmedaDemographicId=100, OmedaDemographicValue=4},
{OmedaDemographicType=1, OmedaDemographicId=101, OmedaDemographicValue=26},
{
OmedaDemographicType=2, OmedaDemographicId=102, OmedaDemographicValue=[16,34]
}
]
Ouput:
line 5:79 mismatched input '16' expecting INT
line 5:82 mismatched input '34' expecting INT
Parser is failing although I have the integer value at the above expected position.
Upvotes: 1
Views: 77
Reputation: 51330
You've made the classic mistake of not ordering your lexer rules properly. You should read and understand the priority rules and their consequences.
In your case, INT
will never be able to match since the WORD
rule can match everything the INT
rule can, and it's defined first in the grammar. These 16
and 32
from the example are WORD
s.
You should remove the ambiguity by not allowing a word to start with a digit:
WORD:
[A-Za-z] [A-Za-z0-9]*;
INT:
[0-9]+;
Or by swapping the order of the rules:
INT:
[0-9]+;
WORD:
[A-Za-z0-9]+;
In this case, you can't have words that are fully numeric, but they will still be able to start with a number.
Upvotes: 2