Reputation: 205
I'm driving crazy trying to generate a parser Grammar with ANTLR. I've got plain text file like:
Diagram : VW 503 FSX 09/02/2015 12/02/2015 STP
Fleet : AAAA
OFF :
AAA 05+44 5R06
KKK 05+55 06.04 1R06 5530
ZZZ 06.24 06.30 1R06 5530
YYY 07.53 REVRSE
YYY 08.23 9G98 5070
WORKS :
MILES :(LD) 1288.35 (ETY) 3.18 (TOT) 1291.53
Each "Diagram" entity is contained beetween "Diagram :" and the "(TOT) before EOF. In the same plain txt file multiple "Diagram" entity can be present.
I've done some test with ANTRL
`grammar Hello2;
xxxt : diagram+;
diagram : DIAGRAM_ini txt fleet LEGS+ DIAGRAM_end;
txt : TEXT;
fleet : FLEET_INI txt;
num : NUMBER;
// Lexer Rules
DIAGRAM_ini : 'Diagram :';
DIAGRAM_end : '(TOT)' ;
LEGS : ('AAA' | 'KKK' | 'ZZZ' | 'YYY') ;
FLEET_INI : 'Fleet :';
TEXT : ('a'..'z')+ ;
NUMBER: ('0'..'9') ;
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ -> skip ;`
My Goal is to be able to parse Diagrams recursively, and gather all LEGS text/number.
Any help/tips is much more than appreciated! Many Thanks
Regs S.
Upvotes: 1
Views: 3951
Reputation: 8075
I suggest not parsing the file like you did. This file does not define a language with words and grammar, but rather a formatted text of chars:
Solution with ANTLR
You need a weaker grammar to solve this problem, e.g.
grammar diagrams;
diagrams : diagram+ ;
diagram : section+ ;
section : WORD ':' body? ;
body : textline+;
textline : (WORD | NUMBER | SIGNS)* ('\r' | '\n')+;
WORD : LETTER+ ;
NUMBER : DIGIT+ ;
SIGNS : SIGN+ ;
WHITESPACE : ( '\t' | ' ' )+ -> skip ;
fragment LETTER : ('a'..'z' | 'A'..'Z') ;
fragment SIGN : ('.'|'+'|'('|')'|'/') ;
fragment DIGIT : ('0'..'9') ;
Run a visitor on the Parsing result
Another alternative:
Try out Packrat parsing (e.g. parboiled) - it is (especially for people with low experience in compiler construction) more comprehensible
Disadvantages:
Upvotes: 1