zhujik
zhujik

Reputation: 6574

ANTLR: Automatic Error Recovery doesn't seem to work

i have a problem with the automatic error recovery of ANTLR v3 which doesn't seem to work in my grammar. Consider following grammar:

grammar test;

parse   :   define*;

define  :   LPAREN 'define' VARIABLE RPAREN;

// Tokens
LPAREN : '(';
RPAREN : ')';

LETTER  :   ('a'..'z'|'A'..'Z');

VARIABLE : LETTER*;

SPACE : (' ' | '\n' | '\t' | '\r') {$channel = HIDDEN;}; 

when i call the parse-rule with following input:

(define alpha)
(define beta)

he successfully parses both define-rules. however, when i enter a token which doesn't fit:

(define alpha)
)
(define beta)

he cancels parsing on the first sight of the misplaced RPAREN token. I thought that antlr could handle misplaced tokens and tries to return to a rule, but it doesn't seem to work for me. What am i doing wrong?

thanks in advance.

Upvotes: 2

Views: 377

Answers (1)

Bart Kiers
Bart Kiers

Reputation: 170227

That is because when you call the parse rule:

parse : define*;

the parser tries to match as many define rules as possible for the input:

(define alpha)
)
(define beta)

After it successfully matches (define alpha), it then sees a ), so it can't match a define rule anymore and stops parsing therefor. And because ) is a valid token in your lexer grammar, you see no warning or error.

You'll need to tell your parser to go through the entire token stream by "anchoring" your main parser rule by placing the EOF (end-of-file) token at the end:

parse : define* EOF;

If you now parse the input again, you will see the following error on your console:

line 2:0 missing EOF at ')'

EDIT

The fact that define* does not recover is probably because there is no fixed amount of tokens, making the recovery process too hard. The following demo seems to confirm my suspicion:

grammar test;

@parser::members {
  public static void main(String[] args) throws Exception {
    String source =  
        "(define alpha) \n" +
        ")              \n" +
        "(define beta)    ";
    testLexer lexer = new testLexer(new ANTLRStringStream(source));
    testParser parser = new testParser(new CommonTokenStream(lexer));
    parser.parse();
  }
}

parse    : define define EOF {System.out.println("parsed >>>" + $text + "<<<");};
define   : LPAREN 'define' VARIABLE RPAREN;
LPAREN   : '(';
RPAREN   : ')';
LETTER   : ('a'..'z'|'A'..'Z');
VARIABLE : LETTER+;
SPACE    : (' ' | '\n' | '\t' | '\r') {$channel = HIDDEN;};

If you run the testParser class, the following is printed to the console:

line 2:0 extraneous input ')' expecting LPAREN
parsed >>>(define alpha) 
)              
(define beta)    <<<

I.e., the warning is printed to the System.err, but the parsing also continues when limiting the parse rule to two define's instead of define*.

Upvotes: 2

Related Questions