ANTLR AST Grammar Issue Mismatched Token Exception

Question

my real grammar is way more complex but I could strip down my problem. So this is the grammar:

grammar test2;
options {language=CSharp3;}

@parser::namespace { Test.Parser }
@lexer::namespace { Test.Parser }

start   : 'VERSION' INT INT project;

project :   START 'project' NAME TEXT END 'project';

START: '/begin';
END: '/end';

WS  :   ( ' '
        | '	'
        | '
'
        | '
'
        ) {$channel=HIDDEN;}
    ;

    INT :   '0'..'9'+;

    NAME:   ('a'..'z' | 'A'..'Z')+;

    TEXT  :  '"'  ( '\' (.) |'"''"' |~( '\' | '"' | '
' | '
' ) )* '"';

    STARTA 
        :   '/begin hello';

And I want to parse this (for example):

VERSION 1 1

/begin project

testproject "description goes here"

/end

project

Now it will not work like this (Mismatched token exception). If I remove the last Token STARTA, it works. But why? I don't get it.

Help is really appreciated. Thanks.

Bart Kiers · Accepted Answer

When the lexer sees the input "/begin " (including the space!), it is committed to the rule STARTA. When it can't match said rule, because the next char in the input is a "p" (from "project") and not a "h" (from "hello"), it will try to match another rule that can match "/begin " (including the space!). But there is no such rule, producing the error:

mismatched character 'p' expecting 'h'

and the lexer will not give up the space and match the START rule.

Remember that last part: once the lexer has matched something, it will not give up on it. It might try other rules that match the same input, but it will not backtrack to match a rule that matches less characters!

This is simply how the lexer works in ANTLR 3.x, no way around it.

ANTLR AST Grammar Issue Mismatched Token Exception

Answers (1)

Related Questions