Reputation: 160
I am a newbie to antlr. I want to write a grammar to parse the below input:
commit a1b2c3d4
The grammar is given below ::
grammar commit;
file : 'commit' COMMITHASH NEWLINE;
COMMITHASH : [a-z0-9]+;
DATE : ~[\r\n]+;
NEWLINE : '\r'?'\n';
When I try parsing the above input using the grammar, it throws the below exception::
line 1:0 mismatched input 'commit a1b2c3d4' expecting 'commit'
Note : I have intentionally added the DATE token. Without the DATE token, it works fine. But I would like to know, what is happening when the DATE token is added.
I had referred the link Antlr4: Mismatched input but am not still clear about what happened.
Upvotes: 0
Views: 2527
Reputation: 99869
ANTLR lexers fully assign unambiguous token types before the parser is ever used. When one lexer rule can match more characters than another lexer rule, the rule matching more characters is always preferred by ANTLR, regardless of the order in which the lexer rules appear in the grammar. When two or more rules match exactly the same length of input symbols (and no other rule matches more than this number of input symbols), a token type is assigned for the rule that appears first in the grammar.
Your lexer contains a rule DATE
that matches all characters except for a newline character. Since this always matches the entire text of a line, and none of your tokens span multiple lines, the result is the following:
commit
, an unnamed token corresponding to this input sequence will be produced.[a-z0-9]+
, a COMMITHASH
token will be created for the entire text of the line. DATE
also matches this input, but COMMITHASH
appears first so it is used.DATE
token will be created for the entire text of the line. Even if the line starts with commit
or a COMMITHASH
, the DATE
rule will be used because it matches a longer sequence of characters.NEWLINE
token will be created for each newline.You will need to do one of the following to resolve the problem. The exact strategy depends on the larger problem you are trying to solve.
DATE
rule, or rewrite it to match a more specific date format.DATE
token might be produced.Upvotes: 3