Reputation: 505
I am using antlr4 to parse .eds files. I wrote a grammar, and I'm having an issue where the parser is parsing every token in the body of a section as part of the body. It seems like antlr4 is just ignoring my grammar rules for the body.
Here is my grammar:
grammar test;
eds : section+;
section : header body;
header : '[' header_name ']';
body : field+;
field : name '=' value STMTEND;
header_name : ~(']')+;
name : Identifier;
raw_value : string
| integer
| hex
| version
| date
| time;
value : raw_value
| list;
list : raw_value list_value+;
list_value : ',' raw_value
| ',';
string : String_standard
| string_list;
string_list : String_standard string_list
| String_standard String_standard;
integer : Integer;
version : Version;
date : Date;
time : Time;
hex : Hex;
String_standard : '"' ( Escape | ~('\'' | '\\' | '\n' | '\r') | '.' | '+' + '/' | ' ') + '"';
Escape : '\\' ( '\'' | '\\' );
Integer : NUMBER+;
Hex : '0' 'x' HEX_DIGIT+;
Version : NUMBER+ '.' NUMBER+
| NUMBER+ '.' NUMBER+ '.' NUMBER+
| NUMBER+ '.' NUMBER+ '.' NUMBER+ '.' NUMBER+;
Date : NUMBER NUMBER '-' NUMBER NUMBER '-' NUMBER NUMBER NUMBER NUMBER;
Time : NUMBER NUMBER ':' NUMBER NUMBER ':' NUMBER NUMBER;
Identifier : Identifier_Char+;
HeaderID : Header_Char+;
fragment
Identifier_Char : LETTER
| NUMBER
| '_';
fragment
Header_Char : LETTER
| NUMBER
| '_'
| ' ';
fragment LETTER : [a-zA-Z];
fragment HEX_DIGIT : [a-fA-F0-9];
fragment NUMBER : [0-9];
STMTEND : SEMICOLON;
fragment SEMICOLON : ';';
fragment NEWLINE : '\r' '\n' | '\n' | '\r';
WS: [ \t\r\n\u000C]+ -> channel(HIDDEN);
LINE_COMMENT: '$' ~[\r\n]* -> channel(HIDDEN);
Here is my input:
[File]
DescText = "EtherNet/IP EDS for ANT lite+ PLC";
CreateDate = 02-16-2018;
CreateTime = 14:13:46;
ModDate = 10-11-2019;
ModTime = 11:05:09;
Revision = 1.2;
HomeURL = "www.bluebotics.com";
1_IOC_Details_License = 0x7B457ED4;
When I visualize the parse tree with the antlr4 gui I see that the header was parsed correctly but the body just has a child for every token:
Here is the tree output where you can see it didn't parse the body at all:
(eds (section (header [ (header_name File) ]) (body DescText = "EtherNet/IP EDS for ANT lite+ PLC" ; CreateDate = 02 16 2018 ; CreateTime = 14 13 46 ; ModDate = 10 11 2019 ; ModTime = 11 05 09 ; Revision = 1 2 ; HomeURL = "www.bluebotics.com" ; 1_IOC_Details_License = 0x7B457ED4 ;)))
How do I alter my grammar so that antlr actually parses the body?
Upvotes: 1
Views: 79
Reputation: 170128
Place ANY : .;
at the end of your grammar so that the lexer does not produce any errors/warnings. That way, it is easier to see where things go wrong. With that ANY
rule added, you will see that your input is tokenised like this:
null `[`
Identifier `File`
null `]`
WS `\n `
HeaderID `DescText `
null `=`
HeaderID ` `
String_standard `"EtherNet/IP EDS for ANT lite+ PLC"`
STMTEND `;`
WS `\n `
HeaderID `CreateDate `
null `=`
HeaderID ` 02`
ANY `-`
Integer `16`
ANY `-`
Integer `2018`
STMTEND `;`
WS `\n `
HeaderID `CreateTime `
null `=`
HeaderID ` 14`
ANY `:`
Integer `13`
ANY `:`
Integer `46`
STMTEND `;`
WS `\n `
HeaderID `ModDate `
null `=`
HeaderID ` 10`
ANY `-`
Integer `11`
ANY `-`
Integer `2019`
STMTEND `;`
WS `\n `
HeaderID `ModTime `
null `=`
HeaderID ` 11`
ANY `:`
Integer `05`
ANY `:`
Integer `09`
STMTEND `;`
WS `\n `
HeaderID `Revision `
null `=`
HeaderID ` 1`
ANY `.`
Integer `2`
STMTEND `;`
WS `\n `
HeaderID `HomeURL `
null `=`
HeaderID ` `
String_standard `"www.bluebotics.com"`
STMTEND `;`
WS `\n `
HeaderID `1_IOC_Details_License `
null `=`
HeaderID ` 0x7B457ED4`
STMTEND `;`
EOF `<EOF>`
As you can see, your HeaderID
is messing things up: it should really not contain spaces. Remove this HeaderID
rule (and the ANY
rule as well) and your parser will parse it correctly:
Upvotes: 2