Nht_e0
Nht_e0

Reputation: 156

Antlr4 Mismatch input

First of all, I have read the solutions for the following similar questions: q1 q2 q3

Still I don't understand why I get the following message:

line 1:0 missing 'PROGRAM' at 'PROGRAM'

when I try to match the following:

PROGRAM test
BEGIN
END

My grammar:

grammar Wengo;

program           : PROGRAM id BEGIN pgm_body END ;
id                : IDENTIFIER ;
pgm_body          : decl func_declarations ;
decl              : string_decl decl | var_decl decl | empty ;

string_decl       : STRING id ASSIGN str SEMICOLON ;
str               : STRINGLITERAL ;

var_decl          : var_type id_list SEMICOLON ;
var_type          : FLOAT | INT ;
any_type          : var_type | VOID ; 
id_list           : id id_tail ;
id_tail           : COMA id id_tail | empty ;

param_decl_list   : param_decl param_decl_tail | empty ;
param_decl        : var_type id ;
param_decl_tail   : COMA param_decl param_decl_tail | empty ;

func_declarations : func_decl func_declarations | empty ;
func_decl         : FUNCTION any_type id (param_decl_list) BEGIN func_body END ;
func_body         : decl stmt_list ;

stmt_list         : stmt stmt_list | empty ;
stmt              : base_stmt | if_stmt | loop_stmt ; 
base_stmt         : assign_stmt | read_stmt | write_stmt | control_stmt ;

assign_stmt       : assign_expr SEMICOLON ;
assign_expr       : id ASSIGN expr ;
read_stmt         : READ ( id_list )SEMICOLON ;
write_stmt        : WRITE ( id_list )SEMICOLON ;
return_stmt       : RETURN expr SEMICOLON ;

expr              : expr_prefix factor ;
expr_prefix       : expr_prefix factor addop | empty ;
factor            : factor_prefix postfix_expr ;
factor_prefix     : factor_prefix postfix_expr mulop | empty ;
postfix_expr      : primary | call_expr ;
call_expr         : id ( expr_list ) ;
expr_list         : expr expr_list_tail | empty ;
expr_list_tail    : COMA expr expr_list_tail | empty ;
primary           : ( expr ) | id | INTLITERAL | FLOATLITERAL ;
addop             : ADD | MIN ;
mulop             : MUL | DIV ;

if_stmt           : IF ( cond ) decl stmt_list else_part ENDIF ;
else_part         : ELSE decl stmt_list | empty ;
cond              : expr compop expr | TRUE | FALSE ;
compop            : LESS | GREAT | EQUAL | NOTEQUAL | LESSEQ | GREATEQ ;
while_stmt        : WHILE ( cond ) decl stmt_list ENDWHILE ;

control_stmt      : return_stmt | CONTINUE SEMICOLON | BREAK SEMICOLON ;
loop_stmt         : while_stmt | for_stmt ;
init_stmt         : assign_expr | empty ;
incr_stmt         : assign_expr | empty ;
for_stmt          : FOR ( init_stmt SEMICOLON cond SEMICOLON incr_stmt ) decl stmt_list ENDFOR ;

COMMENT         : '--' ~[\r\n]* -> skip ;
WS              : [ \t\r\n]+ -> skip ;
NEWLINE         : [ \n] ;
EMPTY           : $ ;

KEYWORD         : PROGRAM|BEGIN|END|FUNCTION|READ|WRITE|IF|ELSE|ENDIF|WHILE|ENDWHILE|RETURN|INT|VOID|STRING|FLOAT|TRUE|FALSE|FOR|ENDFOR|CONTINUE|BREAK ;
OPERATOR        : ASSIGN|ADD|MIN|MUL|DIV|EQUAL|NOTEQUAL|LESS|GREAT|LBRACKET|RBRACKET|SEMICOLON|COMA|LESSEQ|GREATEQ ;

IDENTIFIER      : [a-zA-Z][a-zA-Z0-9]* ;
INTLITERAL      : [0-9]+ ;
FLOATLITERAL    : [0-9]*'.'[0-9]+ ;
STRINGLITERAL   : '"' (~[\r\n"] | '""')* '"' ;

PROGRAM     : 'PROGRAM'; 
BEGIN       : 'BEGIN';
END         : 'END';
FUNCTION    : 'FUNCTION';
READ        : 'READ';
WRITE       : 'WRITE';
IF          : 'IF';
ELSE        : 'ELSE';
ENDIF       : 'ENDIF';
WHILE       : 'WHILE';
ENDWHILE    : 'ENDWHILE';
RETURN      : 'RETURN';
INT         : 'INT';
VOID        : 'VOID';
STRING      : 'STRING';
FLOAT       : 'FLOAT' ;
TRUE        : 'TRUE';
FALSE       : 'FALSE';
FOR         : 'FOR';
ENDFOR      : 'ENDFOR';
CONTINUE    : 'CONTINUE';
BREAK       : 'BREAK';

ASSIGN      : ':='; 
ADD     : '+';
MIN     : '-'; 
MUL     : '*';
DIV     : '/'; 
EQUAL       : '='; 
NOTEQUAL    : '!='; 
LESS        : '<';
GREAT       : '>'; 
LBRACKET    : '('; 
RBRACKET    : ')';
SEMICOLON   : ';';
COMA        : ',';
LESSEQ      : '<=';
GREATEQ     : '>=';

From what I've read, I think there's a mismatch between KEYWORD and PROGRAM, but removing KEYWORD altogether does not solve the problem.

EDIT: Removing KEYWORD gives the following message:

line 3:0 mismatched input 'END' expecting {'INT', 'STRING', 'FLOAT', '+'}

This my grun output when KEYWORD is available:

[@0,0:6='PROGRAM',<KEYWORD>,1:0]
[@1,8:11='test',<IDENTIFIER>,1:8]
[@2,13:17='BEGIN',<KEYWORD>,2:0]
[@3,19:21='END',<KEYWORD>,3:0]
[@4,23:22='<EOF>',<EOF>,4:0]
line 1:0 mismatched input 'PROGRAM' expecting 'PROGRAM'
(program PROGRAM test BEGIN END)

This is the output when KEYWORD is removed:

[@0,0:6='PROGRAM',<'PROGRAM'>,1:0]
[@1,8:11='test',<IDENTIFIER>,1:8]
[@2,13:17='BEGIN',<'BEGIN'>,2:0]
[@3,19:21='END',<'END'>,3:0]
[@4,23:22='<EOF>',<EOF>,4:0]
line 3:0 mismatched input 'END' expecting {'INT', 'STRING', 'FLOAT', '+'}
(program PROGRAM (id test) BEGIN (pgm_body decl func_declarations) END)

Upvotes: 1

Views: 624

Answers (1)

sepp2k
sepp2k

Reputation: 370112

The error about "missing 'PROGRAM'" has been solved when you removed the KEYWORD rule (note that you should also remove the OPERATOR rule for the same reasons).

The error you're encountering now is completely unrelated.

Your current problem concerns the definition of empty, which you didn't show. You've said that you tried both EMPTY : $ ; and EMPTY : ^$ ; (and then presumably empty: EMPTY;), but none of those even compile, so they wouldn't cause the parse error you posted. Either way, the concept of an EMPTY token can't work. When would such a token be generated? Once between every other token? In that case, you'd get a lot of "unexpected EMPTY" errors. No, the whole point of an empty rule is that it should succeed without consuming any tokens.

To achieve that, you can just define empty : ; and remove EMPTY altogether. Alternatively you could remove empty as well and just use an empty alternative (i.e. | ;) wherever you're currently using empty. Either approach will make your code work, but there's a better way:

You're using empty as the base case for rules that basically amount to lists. ANTLR offers the repetition operators * (0 or more) , + (1 or more) as well as the ? operator to make things optional. These allow you to define lists non-recursively and without an empty rule. For example stmt_list could be defined like this:

stmt_list : stmt* ;

And id_list like this:

id_list : (id (',' id)*)? ;

On an unrelated note, your grammar can simplified greatly by making use of the fact that ANTLR 4 supports direct left recursion, so you can get rid of all the different expression rules and just have one that's left-recursive.

That'd give you:

expr : primary
     | id '(' expr_list ')'
     | expr mulop expr
     | expr addop expr
     ;

And the rules expr_prefix, factor, factor_prefix and postfix_expr and call_expr could all be removed.

Upvotes: 1

Related Questions