ANTLR4 grammar not behaving as expected

Question

I have some data required to be parsed. I am using ANTLR4 tool to auto generate java parsers and lexers, that I can use to form a structured data from the input data given below Grammar:

grammar SUBDATA;
subdata:
    data+;
data:
    array;
array:
    '[' obj (',' obj)* ']';
intarray:
    '[' number (',' number)* ']';
number:
    INT;
obj:
    '{' pair (',' pair)* '}';
pair:
    key '=' value;
key:
    WORD;
value:
    INT | WORD | intarray;
WORD:
    [A-Za-z0-9]+;
INT:
    [0-9]+;
WS:
    [ 	

]+ -> skip;

Test Input Data:

[
    {OmedaDemographicType=1, OmedaDemographicId=100, OmedaDemographicValue=4}, 
    {OmedaDemographicType=1, OmedaDemographicId=101, OmedaDemographicValue=26}, 
    {
        OmedaDemographicType=2, OmedaDemographicId=102, OmedaDemographicValue=[16,34]
    }
]

Ouput:

line 5:79 mismatched input '16' expecting INT
line 5:82 mismatched input '34' expecting INT

Parser is failing although I have the integer value at the above expected position.

Lucas Trzesniewski · Accepted Answer

You've made the classic mistake of not ordering your lexer rules properly. You should read and understand the priority rules and their consequences.

In your case, INT will never be able to match since the WORD rule can match everything the INT rule can, and it's defined first in the grammar. These 16 and 32 from the example are WORDs.

You should remove the ambiguity by not allowing a word to start with a digit:

WORD:
    [A-Za-z] [A-Za-z0-9]*;
INT:
    [0-9]+;

Or by swapping the order of the rules:

INT:
    [0-9]+;
WORD:
    [A-Za-z0-9]+;

In this case, you can't have words that are fully numeric, but they will still be able to start with a number.

ANTLR4 grammar not behaving as expected

Answers (1)

Related Questions