rajah9
rajah9

Reputation: 12339

Antlr generated Java doesn't match Antlr IDE

I have a grammar that accepts key / value pairs that appear one per line. The values may be multi-line.

The Eclipse plug-in ANTLR IDE works correctly and accepts a valid test string. However, the generated Java does not accept the same string.

Here is the grammar:

message: block4 ;

block4:  STARTBLOCK '4' COLON expr4+ ENDBLOCK ;

expr4:   NEWLINE (COLON key COLON expr | '-')+;

key:     FIELDVALUE* ; 

expr:    FIELDVALUE* ; 

NEWLINE    : ('\n'|'\r') ;
FIELDVALUE : (~('-'|COLON|ENDBLOCK|STARTBLOCK))+; 
COLON      : ':' ;
STARTBLOCK : '{' ;
ENDBLOCK   : '}' ;

ANTLR IDE parses this correctly: SwiftTiny parse tree

Don't squint... It is dividing up key/expression pairs whether they are single-line values (like 23B / CRED) or multiline values (like 59 / /13212312\r\nRECEIVER NAME S.A\r\n).

Here is the input string:

{4:
:20:007505327853
:23B:CRED
:32A:050902JPY3520000,
:33B:JPY3520000,
:50K:EUROXXXEI
:52A:FEBXXXM1
:53A:MHCXXXJT
:54A:FOOBICXX
:59:/13212312
RECEIVER NAME S.A
:70:FUTURES
:71A:SHA
:71F:EUR12,00
:71F:EUR2,34
-}

When Eclipse runs anltr-3.4-complete.jar on the grammar, it generates SwiftTinyLexer.java and SwiftTinyParser.java. The lexer lexes them into 35 tokens, starting with:

  1. STARTBLOCK
  2. 4
  3. COLON
  4. FIELDVALUE
  5. COLON

I would like token 4 to be an expr4 rather than a FIELDVALUE (and the IDE seems to agree with me). But since it is a FIELDVALUE, the parser is choking on that token with line 1:3 required (...)+ loop did not match anything at input '\r\n'.

Why is there a difference between the way that anltr 3.4 and ANTLR IDE 2.1.2.201108281759 lex the same string?

Is there a way to fix the grammar so that it matches expr4 before it matches FIELDVALUE?

Upvotes: 1

Views: 216

Answers (1)

rajah9
rajah9

Reputation: 12339

The IDE input string has a single \n while the Java test code is getting a Windows-style \r\n.

I changed NEWLINE by adding a "1 or more," that is from

NEWLINE    : ('\n'|'\r') ;

to

NEWLINE    : ('\n'|'\r')+ ;

This allowed the parse go forward without the lexical error, and now it makes sense why the IDE behaved differently from generated Java: They were getting slightly different input strings.

Upvotes: 0

Related Questions