zafar
zafar

Reputation: 2064

ANTLR4 grammar not behaving as expected

I have some data required to be parsed. I am using ANTLR4 tool to auto generate java parsers and lexers, that I can use to form a structured data from the input data given below Grammar:

grammar SUBDATA;
subdata:
    data+;
data:
    array;
array:
    '[' obj (',' obj)* ']';
intarray:
    '[' number (',' number)* ']';
number:
    INT;
obj:
    '{' pair (',' pair)* '}';
pair:
    key '=' value;
key:
    WORD;
value:
    INT | WORD | intarray;
WORD:
    [A-Za-z0-9]+;
INT:
    [0-9]+;
WS:
    [ \t\n\r]+ -> skip;

Test Input Data:

[
    {OmedaDemographicType=1, OmedaDemographicId=100, OmedaDemographicValue=4}, 
    {OmedaDemographicType=1, OmedaDemographicId=101, OmedaDemographicValue=26}, 
    {
        OmedaDemographicType=2, OmedaDemographicId=102, OmedaDemographicValue=[16,34]
    }
]

Ouput:

line 5:79 mismatched input '16' expecting INT
line 5:82 mismatched input '34' expecting INT

GUI Tree O/P

Parser is failing although I have the integer value at the above expected position.

Upvotes: 1

Views: 77

Answers (1)

Lucas Trzesniewski
Lucas Trzesniewski

Reputation: 51330

You've made the classic mistake of not ordering your lexer rules properly. You should read and understand the priority rules and their consequences.

In your case, INT will never be able to match since the WORD rule can match everything the INT rule can, and it's defined first in the grammar. These 16 and 32 from the example are WORDs.

You should remove the ambiguity by not allowing a word to start with a digit:

WORD:
    [A-Za-z] [A-Za-z0-9]*;
INT:
    [0-9]+;

Or by swapping the order of the rules:

INT:
    [0-9]+;
WORD:
    [A-Za-z0-9]+;

In this case, you can't have words that are fully numeric, but they will still be able to start with a number.

Upvotes: 2

Related Questions