matching an unwanted token in Antlr for error reporting

Question

I have a rule like this (oversimplified but just for demo):

matches :
        MATCHES
    ;

but sometimes I incorrectly use 'matching' instead of 'matches' in my code and I'd like it to blow up with a nice error msg. I've previously reached for this type of construct

matches :
        MATCHES
    |
        MATCHING
        {
            err("Wrong keyword, use MATCHES not MATCHING");
        }
    ;

but that requires making a lex symbol MATCHING which will interfere with the lexer. I want to match MATCHING without creating any lex symbol at all.

Any thoughts?

Mike Cargal · Accepted Answer

It's likely that, by not Having a Lexer Rule for matching, the lexer will identify it as something like an IDENTIFIER (assuming your grammar has such rule).

With that in mind, one option might be to let the Lexer identify "matching" as an IDENTIFIER. Then you could write this alternate with IDENTIFIER and a semantic predicate that requires the IDENTIFIER == "matching". Then, in a listener if you encounter the Context that passed the semantic tic predicate, you can add your own, custom, error message.

something akin to: (untested code, so there my be minor errors)

matches :
        MATCHES
    |
        id=IDENTIFER { $id.text == "matching" }?
    ;

You can't really have a successful parse without all of the input being recognized and tokenized. (A failure to tokenize it, will result in an error message. ANTLR will attempt error recovery by either inserting or ignoring tokens that allow it to proceed (and produce an error).

Another possible approach to accomplish what you want, with the specific error message; you might be able to use a custom ErrorListener and override the error message (but, it can be tricky to identify the context when the error is detected.)

matching an unwanted token in Antlr for error reporting

Answers (1)

Related Questions