Antlr Grammar Not validating correctly

Question

I have this ANTLR3 grammar that takes an object called title that structures a dom structure in plain text. Here is a valid sample:

Here is titlepart 1; (##BOLD##this is bold inside a reference text##/BOLD##)

Here is a an invalid title that should fail (it does not which is why I am posting):

Here is titlepart 1;(reference text with no ending parenthesis

Here is the Grammar that I am using:

grammar Title;

options {
    output = AST;
    ASTLabelType=CommonTree;
    backtrack=false; 
}



tokens {
LPAREN='(';
RPAREN=')';
LCURLY='{';
RCURLY='}';
BOLDSTART='##BOLD##';
BOLDEND='##/BOLD##';
UNDERLINESTART='##UNDERLINE##';
UNDERLINEEND='##/UNDERLINE##';
SYMBOLSTART='##SYMBOL##';
SYMBOLEND='##/SYMBOL##';
SUBSCRIPTSTART='##SUBSCRIPT##';
SUBSCRIPTEND='##/SUBSCRIPT##';


SUPERSCRIPTSTART='##SUPERSCRIPT##';
SUPERSCRIPTEND='##/SUPERSCRIPT##';
IMAGESTART='##IMG##';
IMAGEEND='##/IMG##';
SEMICOLON=';';
}


title: titlepart+;

titlepart: ((bold|anytext|specialtext|underline|symbolref|subscript|superscript|image)+referencetext?(SEMICOLON|EOF));

ANYCHAR: ~(';' 
        | '(' 
        | '{' 
        | '}' 
        | ')'); 

anytext: ANYCHAR+;
specialtext: LCURLY(bold|referencetext|anytext|underline|symbolref|superscript|subscript|SEMICOLON)*RCURLY; 
referencetext: LPAREN(referencepart+)RPAREN;
referencepart: (anytext|underline|bold|symbolref|specialtext|superscript|subscript)+SEMICOLON?;

superscript: SUPERSCRIPTSTART(anytext)*SUPERSCRIPTEND; 
image: IMAGESTART(anytext)*IMAGEEND;
subscript: SUBSCRIPTSTART(anytext)*SUBSCRIPTEND; 
bold: BOLDSTART(anytext|underline|superscript|subscript)*BOLDEND; 
underline: UNDERLINESTART(anytext|bold|superscript|subscript)*UNDERLINEEND; 

symbolref: SYMBOLSTART(anytext)*SYMBOLEND;

As you can see the reference text object requires an ending paren but if I omit it, it does not fail.

Here is the log of the parsing:

enter ANYCHAR H line=1:0
exit ANYCHAR e line=1:1
enter title [@0,0:0='H',<4>,1:0]
enter titlepart [@0,0:0='H',<4>,1:0]
enter anytext [@0,0:0='H',<4>,1:0]
enter ANYCHAR e line=1:1
exit ANYCHAR r line=1:2
enter ANYCHAR r line=1:2
exit ANYCHAR e line=1:3
enter ANYCHAR e line=1:3
exit ANYCHAR   line=1:4
enter ANYCHAR   line=1:4
exit ANYCHAR i line=1:5
enter ANYCHAR i line=1:5
exit ANYCHAR s line=1:6
enter ANYCHAR s line=1:6
exit ANYCHAR   line=1:7
enter ANYCHAR   line=1:7
exit ANYCHAR t line=1:8
enter ANYCHAR t line=1:8
exit ANYCHAR i line=1:9
enter ANYCHAR i line=1:9
exit ANYCHAR t line=1:10
enter ANYCHAR t line=1:10
exit ANYCHAR l line=1:11
enter ANYCHAR l line=1:11
exit ANYCHAR e line=1:12
enter ANYCHAR e line=1:12
exit ANYCHAR p line=1:13
enter ANYCHAR p line=1:13
exit ANYCHAR a line=1:14
enter ANYCHAR a line=1:14
exit ANYCHAR r line=1:15
enter ANYCHAR r line=1:15
exit ANYCHAR t line=1:16
enter ANYCHAR t line=1:16
exit ANYCHAR   line=1:17
enter ANYCHAR   line=1:17
exit ANYCHAR 1 line=1:18
enter ANYCHAR 1 line=1:18
exit ANYCHAR ; line=1:19
enter SEMICOLON ; line=1:19
exit SEMICOLON ( line=1:20
exit anytext [@19,19:19=';',<13>,1:19]
enter LPAREN ( line=1:20
exit LPAREN r line=1:21
exit titlepart [@20,20:20='(',<10>,1:20]
exit title [@20,20:20='(',<10>,1:20]
2017-06-30 01:29:35,957 DEBUG [TitleConverter]:317 ( (title (titlepart (anytex
t H e r e   i s   t i t l e p a r t   1) ;)))

As you can see, it gets to the ( after the ; and just stops parsing. Interestingly, if I add a space after the ; it fails as expected. Can anyone tell me what is going on?

Antlr Grammar Not validating correctly

Answers (1)

Related Questions