Alexis Winters
Alexis Winters

Reputation: 505

Why is antrl4 not recognizing tokens as part of rules in grammar?

I am using antlr4 to parse .eds files. I wrote a grammar, and I'm having an issue where the parser is parsing every token in the body of a section as part of the body. It seems like antlr4 is just ignoring my grammar rules for the body.

Here is my grammar:

grammar test;

eds                 :   section+;

section             :   header body;

header              :   '[' header_name ']';

body                :   field+;

field               :   name '=' value STMTEND;

header_name         :   ~(']')+;

name                :   Identifier;

raw_value           :   string
                    |   integer
                    |   hex
                    |   version
                    |   date
                    |   time;

value               :   raw_value
                    |   list;

list                :   raw_value list_value+;

list_value          :   ',' raw_value
                    |   ',';

string              :   String_standard
                    |   string_list;

string_list         :   String_standard string_list
                    |   String_standard String_standard;

integer             :   Integer;
version             :   Version;
date                :   Date;
time                :   Time;
hex                 :   Hex;

String_standard     :   '"' ( Escape | ~('\'' | '\\' | '\n' | '\r') | '.' | '+' + '/' | ' ') + '"';
 
Escape              :   '\\' ( '\'' | '\\' );

Integer             :   NUMBER+;

Hex                 :   '0' 'x' HEX_DIGIT+;

Version             :   NUMBER+ '.' NUMBER+
                    |   NUMBER+ '.' NUMBER+ '.' NUMBER+
                    |   NUMBER+ '.' NUMBER+ '.' NUMBER+ '.' NUMBER+;

Date                :   NUMBER NUMBER '-' NUMBER NUMBER '-' NUMBER NUMBER NUMBER NUMBER;

Time                :   NUMBER NUMBER ':' NUMBER NUMBER ':' NUMBER NUMBER;

Identifier          :   Identifier_Char+;

HeaderID            :   Header_Char+;

fragment 
Identifier_Char     :   LETTER
                    |   NUMBER
                    |   '_';

fragment
Header_Char         :   LETTER
                    |   NUMBER 
                    |   '_' 
                    |   ' ';


fragment LETTER              :   [a-zA-Z];

fragment HEX_DIGIT           :   [a-fA-F0-9];

fragment NUMBER              :   [0-9];

STMTEND             :   SEMICOLON;

fragment SEMICOLON : ';';
fragment NEWLINE   : '\r' '\n' | '\n' | '\r';

WS:                 [ \t\r\n\u000C]+ -> channel(HIDDEN);
LINE_COMMENT:       '$' ~[\r\n]*    -> channel(HIDDEN);

Here is my input:

[File]
        DescText = "EtherNet/IP EDS for ANT lite+ PLC";
        CreateDate = 02-16-2018;
        CreateTime = 14:13:46;
        ModDate = 10-11-2019;
        ModTime = 11:05:09;
        Revision = 1.2;
        HomeURL = "www.bluebotics.com";
        1_IOC_Details_License = 0x7B457ED4;

When I visualize the parse tree with the antlr4 gui I see that the header was parsed correctly but the body just has a child for every token:

Here is the tree output where you can see it didn't parse the body at all:

(eds (section (header [ (header_name File) ]) (body DescText  =   "EtherNet/IP EDS for ANT lite+ PLC" ; CreateDate  =  02 16 2018 ; CreateTime  =  14 13 46 ; ModDate  =  10 11 2019 ; ModTime  =  11 05 09 ; Revision  =  1 2 ; HomeURL  =   "www.bluebotics.com" ; 1_IOC_Details_License  =  0x7B457ED4 ;)))

How do I alter my grammar so that antlr actually parses the body?

Upvotes: 1

Views: 79

Answers (1)

Bart Kiers
Bart Kiers

Reputation: 170128

Place ANY : .; at the end of your grammar so that the lexer does not produce any errors/warnings. That way, it is easier to see where things go wrong. With that ANY rule added, you will see that your input is tokenised like this:

null                      `[`
Identifier                `File`
null                      `]`
WS                        `\n        `
HeaderID                  `DescText `
null                      `=`
HeaderID                  ` `
String_standard           `"EtherNet/IP EDS for ANT lite+ PLC"`
STMTEND                   `;`
WS                        `\n        `
HeaderID                  `CreateDate `
null                      `=`
HeaderID                  ` 02`
ANY                       `-`
Integer                   `16`
ANY                       `-`
Integer                   `2018`
STMTEND                   `;`
WS                        `\n        `
HeaderID                  `CreateTime `
null                      `=`
HeaderID                  ` 14`
ANY                       `:`
Integer                   `13`
ANY                       `:`
Integer                   `46`
STMTEND                   `;`
WS                        `\n        `
HeaderID                  `ModDate `
null                      `=`
HeaderID                  ` 10`
ANY                       `-`
Integer                   `11`
ANY                       `-`
Integer                   `2019`
STMTEND                   `;`
WS                        `\n        `
HeaderID                  `ModTime `
null                      `=`
HeaderID                  ` 11`
ANY                       `:`
Integer                   `05`
ANY                       `:`
Integer                   `09`
STMTEND                   `;`
WS                        `\n        `
HeaderID                  `Revision `
null                      `=`
HeaderID                  ` 1`
ANY                       `.`
Integer                   `2`
STMTEND                   `;`
WS                        `\n        `
HeaderID                  `HomeURL `
null                      `=`
HeaderID                  ` `
String_standard           `"www.bluebotics.com"`
STMTEND                   `;`
WS                        `\n        `
HeaderID                  `1_IOC_Details_License `
null                      `=`
HeaderID                  ` 0x7B457ED4`
STMTEND                   `;`
EOF                       `<EOF>`

As you can see, your HeaderID is messing things up: it should really not contain spaces. Remove this HeaderID rule (and the ANY rule as well) and your parser will parse it correctly:

enter image description here

Upvotes: 2

Related Questions