Reputation: 2588
my real grammar is way more complex but I could strip down my problem. So this is the grammar:
grammar test2;
options {language=CSharp3;}
@parser::namespace { Test.Parser }
@lexer::namespace { Test.Parser }
start : 'VERSION' INT INT project;
project : START 'project' NAME TEXT END 'project';
START: '/begin';
END: '/end';
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
INT : '0'..'9'+;
NAME: ('a'..'z' | 'A'..'Z')+;
TEXT : '"' ( '\\' (.) |'"''"' |~( '\\' | '"' | '\n' | '\r' ) )* '"';
STARTA
: '/begin hello';
And I want to parse this (for example):
VERSION 1 1
/begin project
testproject "description goes here"
/end
project
Now it will not work like this (Mismatched token exception). If I remove the last Token STARTA, it works. But why? I don't get it.
Help is really appreciated. Thanks.
Upvotes: 1
Views: 2547
Reputation: 170308
When the lexer sees the input "/begin "
(including the space!), it is committed to the rule STARTA
. When it can't match said rule, because the next char in the input is a "p"
(from "project"
) and not a "h"
(from "hello"
), it will try to match another rule that can match "/begin "
(including the space!). But there is no such rule, producing the error:
mismatched character 'p' expecting 'h'
and the lexer will not give up the space and match the START
rule.
Remember that last part: once the lexer has matched something, it will not give up on it. It might try other rules that match the same input, but it will not backtrack to match a rule that matches less characters!
This is simply how the lexer works in ANTLR 3.x, no way around it.
Upvotes: 2