Reputation: 1
I have this ANTLR3 grammar that takes an object called title that structures a dom structure in plain text. Here is a valid sample:
Here is titlepart 1; (##BOLD##this is bold inside a reference text##/BOLD##)
Here is a an invalid title that should fail (it does not which is why I am posting):
Here is titlepart 1;(reference text with no ending parenthesis
Here is the Grammar that I am using:
grammar Title;
options {
output = AST;
ASTLabelType=CommonTree;
backtrack=false;
}
tokens {
LPAREN='(';
RPAREN=')';
LCURLY='{';
RCURLY='}';
BOLDSTART='##BOLD##';
BOLDEND='##/BOLD##';
UNDERLINESTART='##UNDERLINE##';
UNDERLINEEND='##/UNDERLINE##';
SYMBOLSTART='##SYMBOL##';
SYMBOLEND='##/SYMBOL##';
SUBSCRIPTSTART='##SUBSCRIPT##';
SUBSCRIPTEND='##/SUBSCRIPT##';
SUPERSCRIPTSTART='##SUPERSCRIPT##';
SUPERSCRIPTEND='##/SUPERSCRIPT##';
IMAGESTART='##IMG##';
IMAGEEND='##/IMG##';
SEMICOLON=';';
}
title: titlepart+;
titlepart: ((bold|anytext|specialtext|underline|symbolref|subscript|superscript|image)+referencetext?(SEMICOLON|EOF));
ANYCHAR: ~(';'
| '('
| '{'
| '}'
| ')');
anytext: ANYCHAR+;
specialtext: LCURLY(bold|referencetext|anytext|underline|symbolref|superscript|subscript|SEMICOLON)*RCURLY;
referencetext: LPAREN(referencepart+)RPAREN;
referencepart: (anytext|underline|bold|symbolref|specialtext|superscript|subscript)+SEMICOLON?;
superscript: SUPERSCRIPTSTART(anytext)*SUPERSCRIPTEND;
image: IMAGESTART(anytext)*IMAGEEND;
subscript: SUBSCRIPTSTART(anytext)*SUBSCRIPTEND;
bold: BOLDSTART(anytext|underline|superscript|subscript)*BOLDEND;
underline: UNDERLINESTART(anytext|bold|superscript|subscript)*UNDERLINEEND;
symbolref: SYMBOLSTART(anytext)*SYMBOLEND;
As you can see the reference text object requires an ending paren but if I omit it, it does not fail.
Here is the log of the parsing:
enter ANYCHAR H line=1:0
exit ANYCHAR e line=1:1
enter title [@0,0:0='H',<4>,1:0]
enter titlepart [@0,0:0='H',<4>,1:0]
enter anytext [@0,0:0='H',<4>,1:0]
enter ANYCHAR e line=1:1
exit ANYCHAR r line=1:2
enter ANYCHAR r line=1:2
exit ANYCHAR e line=1:3
enter ANYCHAR e line=1:3
exit ANYCHAR line=1:4
enter ANYCHAR line=1:4
exit ANYCHAR i line=1:5
enter ANYCHAR i line=1:5
exit ANYCHAR s line=1:6
enter ANYCHAR s line=1:6
exit ANYCHAR line=1:7
enter ANYCHAR line=1:7
exit ANYCHAR t line=1:8
enter ANYCHAR t line=1:8
exit ANYCHAR i line=1:9
enter ANYCHAR i line=1:9
exit ANYCHAR t line=1:10
enter ANYCHAR t line=1:10
exit ANYCHAR l line=1:11
enter ANYCHAR l line=1:11
exit ANYCHAR e line=1:12
enter ANYCHAR e line=1:12
exit ANYCHAR p line=1:13
enter ANYCHAR p line=1:13
exit ANYCHAR a line=1:14
enter ANYCHAR a line=1:14
exit ANYCHAR r line=1:15
enter ANYCHAR r line=1:15
exit ANYCHAR t line=1:16
enter ANYCHAR t line=1:16
exit ANYCHAR line=1:17
enter ANYCHAR line=1:17
exit ANYCHAR 1 line=1:18
enter ANYCHAR 1 line=1:18
exit ANYCHAR ; line=1:19
enter SEMICOLON ; line=1:19
exit SEMICOLON ( line=1:20
exit anytext [@19,19:19=';',<13>,1:19]
enter LPAREN ( line=1:20
exit LPAREN r line=1:21
exit titlepart [@20,20:20='(',<10>,1:20]
exit title [@20,20:20='(',<10>,1:20]
2017-06-30 01:29:35,957 DEBUG [TitleConverter]:317 (<grammar title> (title (titlepart (anytex
t H e r e i s t i t l e p a r t 1) ;)))
As you can see, it gets to the (
after the ;
and just stops parsing. Interestingly, if I add a space after the ;
it fails as expected. Can anyone tell me what is going on?
Upvotes: 0
Views: 39
Reputation: 53542
This should really go to a FAQ if there was one for ANTLR. If you want that your entire input is parsed then add an end anchor to your main rule (which is the built-in EOF token):
title: titlepart+ EOF;
Upvotes: 1