RobV
RobV

Reputation: 255

Antlr4 match whole input string or bust

I am new to Antlr4 and have been wracking my brain for some days now about a behaviour that I simply don't understand. I have the following combined grammar and expect it to fail and report an error, but it doesn't:

grammar MWE;
parse: cell EOF;
cell: WORD;
WORD: ('a'..'z')+;

If I feed it the input

a4

I expect it to not be able to parse it, because I want it to match the whole input string and not just a part of it, as signified by the EOF. But instead it reports no error (I listen for errors with a errorlistener implementing the IAntlrErrorListener interface) and gives me the following parse tree:

(parse (cell a) <EOF>)

Why is this?

Upvotes: 5

Views: 663

Answers (1)

Sam Harwell
Sam Harwell

Reputation: 100059

The error recovery mechanism when input is reached which no lexer rule matches is to drop a character and continue with the next one. In your case, the lexer is dropping the 4 character, so your parser is seeing the equivalent of this input:

a

The solution is to instruct the lexer to create a token for the dropped character rather than ignore it, and pass that token on to the parser where an error will be reported. In the grammar, this rule takes the following form and is always added as the last rule in the grammar. If you have multiple lexer modes, a rule with this form should appear as the last rule in the default mode as well as the last rule in each extra mode.

ErrChar
  : .
  ;

Upvotes: 5

Related Questions