Reputation: 1967
I am trying to implement a grammar in Antlr4 for a simple template engine. This engine consists of 3 different clauses:
IF ANSWERED ( variable )
END IF
Variable
Variable can be any upper or lowercase letter including white spaces. Both IF ANSWERED
and END IF
are always uppercase.
I have written the following grammar/lexer rules so far, but my problem is that IF ANSWERED
keeps getting recognized as a Variable and not as 2 tokens IF
and ANSWERED
.
grammar program;
/**grammar */
command: (ifStart | ifEnd | VARIABLE ) EOF;
ifStart: IF ANSWERED '(' VARIABLE ')';
ifEnd: 'END IF';
/** lexer */
IF: 'IF';
ANSWERED: 'ANSWERED';
TEXT: (LOWERCASE | UPPERCASE | NUMBER) ;
VARIABLE: (TEXT | [ \t\r\n])+;
fragment LOWERCASE: [a-z];
fragment UPPERCASE: [A-Z];
fragment NUMBER: [0-9];
If I try to parse IF ANSWERED ( FirstName )
I get the following output:
[@0,0:10='IF ANSWERED',**<VARIABLE>**,1:0]
[@1,11:11='(',<'('>,1:11]
[@2,12:25='Execution date',<VARIABLE>,1:12]
[@3,26:26=')',<')'>,1:26]
[@4,27:26='<EOF>',<EOF>,1:27]
line 1:0 mismatched input 'IF ANSWERED' expecting 'IF'
I read that Antlr4 is greedy and tries to match the biggest possible token, but I fail to understand what is the correct approach, or how to think through the problem to find a solution.
Upvotes: 1
Views: 114
Reputation: 170158
Correct: ANTLR's lexer is greedy, and tries to consume as much as possible. That is why IF ANSWERED
is tokenised as a TEXT
token instead of 2 separate keywords. You'll need to change TEXT
so that it does not match spaces.
Something like this could get you started:
parse
: command* EOF
;
command
: (ifStatement | variable)+
;
ifStatement
: IF ANSWERED '(' variable ')' command* END IF
;
variable
: TEXT
;
IF : 'IF';
END : 'END';
ANSWERED : 'ANSWERED';
TEXT : [a-zA-Z0-9]+;
SPACES : [ \t\r\n]+ -> skip;
Upvotes: 2