Rossco
Rossco

Reputation: 1122

Allow invalid input in Lexer or Push to Parser?

I'm using Antlr4 to build a parser and I have an implementation question. I've seen a number of grammars which have a default lexer rule which will match any character at the end. I've also seen some recommendations to use such a rule so that the lexer will match any input and create a token. This effectively passes the problem on to the parser - I'm assuming the grammar authors believe this to be an improvement.

Is this a good idea? If so why?

Upvotes: 0

Views: 158

Answers (1)

GRosenberg
GRosenberg

Reputation: 6001

By creating an extension of DefaultErrorStrategy and setting it on the parser, you can control how the parser deals with erroneous input. Some number of consecutive unknown/invalid tokens can be skipped and the parser re-synchronized with the input stream. Set an instance of the ParserErrorListener on the parser to report the errors and recovery.

If these features are not of interest, then drop unknown/invalid source text in the lexer and set the BailErrorStrategy on the parser. You can still use a ParserErrorListener to report the point and circumstances of a parser failure.

Either as appropriate for your use case.

Update

For example, suppose your input data stream is known subject to error bursts - whether due to transmission data drops or by parsing a text that a user is actively typing into - and the use case goal is to parse what can be parsed and mark by errors what cannot/was skipped. Use an ErrorStrategy to intelligently re-sync the parser to minimize the apparent extent of each error burst or even guess the missing token sequence.

Alternately, when parsing a text that should have no errors - the use case goal is to do an 'accurate' conversion - then immediately terminating the parse on any error is proper. Use the BailErrorStrategy coupled with an ErrorListener tailored to give the most appropriate detail about the source and nature of the error.

Upvotes: 1

Related Questions