Antlr4 how to detect unrecognized token and given sentence is invalid

Question

I am trying to develop a new language with Antlr. Here is my grammar file :

grammar test;

program : vr'.' to'.' e 
        ;
e: be
 | be'.' top'.' be
 ;
be: 'fg' 
  | 'fs' 
  | 'mc' 
  ;
to: 'n' 
  | 'a' 
  | 'ev' 
  ;
vr: 'er' 
  | 'fp' 
  ;
top: 'b' 
  | 'af' 
  ;
Whitespace : [ 	
]+ ->skip 
           ;

Main.java

String expression = "fp.n.fss";
//String expression = "fp.n.fs.fs";
ANTLRInputStream input = new ANTLRInputStream(expression);
testLexer lexer = new testLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
testParser parser = new testParser(tokens);
//remove listener and add listener does not work
ParseTree parseTree = parser.program();

Everything is good for valid sentences. But I want to catch unrecognized tokens and invalid sentences in order to return meaningful messages. Here are two test cases for my problem.

fp.n.fss => anltr gives this error token recognition error at: 's' but i could not handle this error. There are same example error handler class which use BaseErrorListener but in my case it does not work.
fp.n.fs.fs => this sentence is invalid for my grammar but i could not catch. How can i catch invalidations like this sentence?

D3181 · Accepted Answer

Firstly welcome to SO and also to the ANTLR section! Error handling seems to be one of those topics frequently asked about, theres a really good thread here about handling errors in Java/ANTLR4.

You most likely wanted to extend the functionality of the defaultErrorStrategy to handle the particular issues and handle them in a way differently that just printing the error line 1:12 token recognition error at: 's'.

To do this you can implement your own version of the default error strategy class:

Parser parser = new testParser(tokens);
            parser.setErrorHandler(new DefaultErrorStrategy()
    {

        @Override
        public void recover(Parser recognizer, RecognitionException e) {
            for (ParserRuleContext context = recognizer.getContext(); context != null; context = context.getParent()) {
                context.exception = e;
            }

            throw new ParseCancellationException(e);
        }


        @Override
        public Token recoverInline(Parser recognizer)
            throws RecognitionException
        {
            InputMismatchException e = new InputMismatchException(recognizer);
            for (ParserRuleContext context = recognizer.getContext(); context != null; context = context.getParent()) {
                context.exception = e;
            }

            throw new ParseCancellationException(e);
        }
    });

 parser.program(); //back to first rule in your grammar

I would like to also recommend splitting your parser and lexer grammars up, if not for readability but also because many tools used to analyse the .g4 file (ANTLRWORKS 2 particularly) will complain about implicity declarations.

For your example it can be modified to the following structure:

grammar test;

program : vr DOT to DOT e 
        ;
e: be
 | be DOT top DOT be
 ;
be: FG 
  | FS
  | MC 
  ;
to: N
  | A 
  | EV
  ;
vr: ER 
  | FP 
  ;
top: B
  | AF
  ;
Whitespace : [ 	
]+ ->skip 
           ;

DOT : '.'
    ;

A: 'A'|'a'
 ;

AF: 'AF'|'af'
 ;
N: 'N'|'n'
 ;
MC: 'MC'|'mc'
 ;
EV:'EV'|'ev'
 ;
FS: 'FS'|'fs'
 ;
FP: 'FP'|'fp'
 ;
FG: 'FG'|'fg'
 ;
ER: 'ER'|'er'
 ;
B: 'B'|'b'
 ;

You can also find all the methods available for the defaultErrorStrategy Class here and by adding those methods to your "new" error strategy implementation handle whatever exceptions you require.

Hope this helps and Good luck with your project!

Antlr4 how to detect unrecognized token and given sentence is invalid

Answers (1)

Related Questions