Michael
Michael

Reputation: 8526

Have Antlr lexer handle syntax errors as tokens

I'm using Antlr 4.2.2 and Java 1.7 for some text processing. I've extended BaseErrorListener and overridden syntaxError() to report syntax errors, which works well. But I want it to treat the mismatched text as a token and return it, rather than dropping it entirely.

In my lexer I have this rule:

TEXT : ~[<{|]+ ;

When I try to parse "foo { {" I get a syntax error as expected: token recognition error at: '{ {'. But I'd like that '{ {' to be reported as a token as well, so that it doesn't get dropped from the input stream.

Upvotes: 0

Views: 545

Answers (1)

Onur
Onur

Reputation: 5205

You could add a catchall lexer rule like this at the end of the file:

Error : . ;

This will produce Error tokens which will most likely be reported as extra "Error" token during parsing.

You could also do this:

 SilentError : . -> channel(LexingErrorChannel); // you need to set the constant for this channel

Which will silently ignore the lexing errors (if you like to handle/report them yourself).

But I would not really do this if it can be circumvented.

Note: This will produce one Error token per character. If you "know" possible errors, you can add other rules like this:

Error : [<{|]'+
      | .
      ;

Be careful not to be too greedy though.

Upvotes: 1

Related Questions