Reputation: 540
I have a text file which am parsing through antlr4.
Text format
TEST OFFICE
98 KINGS STREET
DATE 18/05/22 FORT,ABC
ABC HOTEL PLC
CFA 843 PAGE 1
NO 40 DUTCH BUILDING
PARK ROAD HALL
KINGS STREET 01 ACCOUNT NO 5XX75YY200
2128 100 CURRENCY USD
BALANCE AS OF 17MAY22 1,115.75
18MAY22 CLOSING BALANCE 1,115.75
TOTAL DEPOSITS 0 ITEMS .00
TOTAL WITHDRAWALS 0 ITEMS .00
Grammar Rules
// Define a grammar called CustomValidation
grammar CustomValidation;
@header {
package grammar ;
}
init : statement+ ;
statement : detail+ ;
detail : content+ ;
content : id ;
id : (WORD | NUMBER | SIGNS)* ('\r'|'\n')+ ;
WORD : LETTER+ ;
NUMBER : DIGIT+ ;
SIGNS : SIGN+ ;
WHITESPACE : ( '\t' | ' ' )+ -> skip ;
fragment LETTER : ('a'..'z' | 'A'..'Z') ;
fragment SIGN : ('.'|'+'|'('|')'|'/'|','|'-'|'&'|'\''|':'|'#'|'_'|'*'|';'|'%'|'@'|'"'|'`') ;
fragment DIGIT : ('0'..'9') ;
I get this
Error -> line 36127:63 extraneous input '<EOF>' expecting {'
', '
', WORD, NUMBER, SIGNS}
What am i doing wrong and how could i make these rules better?
Upvotes: 0
Views: 1016
Reputation: 170178
Your example input get properly parsed if you start with the init
rule:
String source = " TEST OFFICE\n" +
" 98 KINGS STREET\n" +
" DATE 18/05/22 FORT,ABC\n" +
" ABC HOTEL PLC\n" +
" CFA 843 PAGE 1\n" +
" NO 40 DUTCH BUILDING\n" +
" PARK ROAD HALL\n" +
" KINGS STREET 01 ACCOUNT NO 5XX75YY200\n" +
" 2128 100 CURRENCY USD\n" +
" BALANCE AS OF 17MAY22 1,115.75\n" +
" 18MAY22 CLOSING BALANCE 1,115.75\n" +
" TOTAL DEPOSITS 0 ITEMS .00\n" +
" TOTAL WITHDRAWALS 0 ITEMS .00\n";
CustomValidationLexer lexer = new CustomValidationLexer(CharStreams.fromString(source));
CustomValidationParser parser = new CustomValidationParser(new CommonTokenStream(lexer));
ParseTree root = parser.init();
System.out.println(root.toStringTree(parser));
prints:
(init (statement (detail (content (id TEST OFFICE \n)) (content (id 98 KINGS STREET \n)) (content (id DATE 18 / 05 / 22 FORT , ABC \n)) (content (id ABC HOTEL PLC \n)) (content (id CFA 843 PAGE 1 \n)) (content (id NO 40 DUTCH BUILDING \n)) (content (id PARK ROAD HALL \n)) (content (id KINGS STREET 01 ACCOUNT NO 5 XX 75 YY 200 \n)) (content (id 2128 100 CURRENCY USD \n)) (content (id BALANCE AS OF 17 MAY 22 1 , 115 . 75 \n)) (content (id 18 MAY 22 CLOSING BALANCE 1 , 115 . 75 \n)) (content (id TOTAL DEPOSITS 0 ITEMS . 00 \n)) (content (id TOTAL WITHDRAWALS 0 ITEMS . 00 \n)))))
which looks like this indented:
(init
(statement
(detail
(content
(id TEST OFFICE \n))
(content
(id 98 KINGS STREET \n))
(content
(id DATE 18 / 05 / 22 FORT , ABC \n))
(content
(id ABC HOTEL PLC \n))
(content
(id CFA 843 PAGE 1 \n))
(content
(id NO 40 DUTCH BUILDING \n))
(content
(id PARK ROAD HALL \n))
(content
(id KINGS STREET 01 ACCOUNT NO 5 XX 75 YY 200 \n))
(content
(id 2128 100 CURRENCY USD \n))
(content
(id BALANCE AS OF 17 MAY 22 1 , 115 . 75 \n))
(content
(id 18 MAY 22 CLOSING BALANCE 1 , 115 . 75 \n))
(content
(id TOTAL DEPOSITS 0 ITEMS . 00 \n))
(content
(id TOTAL WITHDRAWALS 0 ITEMS . 00 \n)))))
My guess is that the error you're getting is produced by a parser generated from a grammar that looks differently than the grammar you now posted. Every time you make changes to the grammar, you need to let ANTLR generate new lexer- and parser classes.
Upvotes: 1