Ashkan Kazemi
Ashkan Kazemi

Reputation: 1097

Parsing Cool Language with antlr, cant print the desired out put

I'm a writing a parser/lexer for COOL ( classroom object-oriented language ). u can see the grammar in the following link: ( LAST PAGE OF THE MANUAL )

http://theory.stanford.edu/~aiken/software/cool/cool-manual.pdf

I am using ANTLR to write this program and with the following input, I desire the following output :

input :

class Main inherits IO {
  main(): Object {{
    x <- 2 + 3 *4;
  }};
};

output :

1
2
3
11
6
16
27
18
27
27

but the output I am getting is:

1
2
3
11
6
27
16
27
18
27

and here is my parser/lexer code:

// parser
grammar CA2;

program : {System.out.println("1");} (classdef';')+ ;
classdef : {System.out.println("2");} CLASS ID (INHERITS ID)? '{' (feature';')* '}' ;
feature : {System.out.println("3");} ID OPENP (formal (','formal)*)? CLOSEP ':' ID '{' expr '}' 
        | {System.out.println("4");} ID ':' ID ( POINTTOLEFT expr )? ;
formal : {System.out.println("5");} ID ':' ID ;
expr : {System.out.println("6");} ID POINTTOLEFT expr exprprime
     | {System.out.println("8");} ID OPENP ( expr (','expr)* )? CLOSEP exprprime
     | {System.out.println("9");} IF expr THEN expr ELSE expr FI exprprime
     | {System.out.println("10");} WHILE expr LOOP expr POOL exprprime 
     | {System.out.println("11");} '{' (expr';')+ '}' exprprime 
     | {System.out.println("12");} LET ID ':' ID (POINTTOLEFT expr)? (','ID ':' ID (POINTTOLEFT expr)?)* IN expr exprprime
     | {System.out.println("13");} CASE expr OF (ID POINTTORIGHT expr ';')+ ESAC exprprime
     | {System.out.println("14");} NEW ID exprprime
     | {System.out.println("15");} ISVOID expr exprprime
     /*| {System.out.println("16");} expr ADD expr
     | {System.out.println("17");} expr SUB expr
     | {System.out.println("18");} expr MULTIPLY expr
     | {System.out.println("19");} expr DIV expr
     | {System.out.println("20");} TILDA expr
     | {System.out.println("21");} expr LARGERTHAN expr
     | {System.out.println("22");} expr LARGEREQ expr
     | {System.out.println("23");} expr EQUALS expr
     | {System.out.println("24");} NOT expr
     | {System.out.println("25");} OPENP expr CLOSEP
     | {System.out.println("26");} ID
     | {System.out.println("27");} INTEGER*/
     | {System.out.println("28");} STRING exprprime | mathex exprprime ;
     /*| {System.out.println("29");} TRUE
     | {System.out.println("30");} FALSE ;*/
exprprime : {System.out.println("7");} (('@'ID)?)'.'ID OPENP (expr (','expr)*)? CLOSEP exprprime | ;
mathex : b ;
b : {System.out.println("24");} NOT b | c ;
cprime : {System.out.println("21");} LARGERTHAN d cprime 
       | {System.out.println("22");} LARGEREQ d cprime 
       | {System.out.println("23");} EQUALS d cprime | ;
c : d cprime ;
dprime : {System.out.println("16");} ADD e dprime 
       | {System.out.println("17");} SUB e dprime | ;
d : e dprime ;
eprime : {System.out.println("18");} MULTIPLY f eprime 
       | {System.out.println("19");} DIV f eprime | ;
e : f eprime ;
f : {System.out.println("20");} TILDA f | g ;
g : {System.out.println("25");} OPENP mathex CLOSEP 
  | {System.out.println("26");} ID 
  | {System.out.println("27");} INTEGER 
  | {System.out.println("29");} TRUE 
  | {System.out.println("30");} FALSE ;

//lexer
TRUE : 'true' ;
FALSE : 'false' ;
INHERITS : 'inherits' ;
CLASS : 'class' ;
IF : 'if' ;
THEN : 'then' ;
ELSE : 'else' ;
FI : 'fi' ;
WHILE : 'while' ;
LOOP : 'loop' ;
POOL : 'pool' ;
LET : 'let' ;
IN : 'in' ;
CASE : 'case' ;
OF : 'of' ;
ESAC : 'esac' ;
NEW : 'new' ;
ISVOID : 'isvoid' ;
NOT : 'not' ;
TILDA : '~' ;
WHITESPACE : [ ' '|'\r'|'\n'|'\t']+ ->skip ;
INTEGER : [0-9]+ ;
ID : ['_'a-zA-Z][a-zA-Z0-9'_']* ;
ADD : '+' ;
MULTIPLY : '*' ;
SUB : '-' ;
DIV : '/' ;
OPENP : '(' ;
CLOSEP : ')' ;
EQUALS : '=' ;
LARGERTHAN : '<' ;
LARGEREQ : '<=' ;
POINTTOLEFT : '<-' ;
POINTTORIGHT : '=>' ;
STRING : '"'(~[\r\n])*'"' ;

This is the code version of the COOL grammar in ANTLR. the parts that are commented in the main code are disambiguated ( means are ridden of ambiguity ! ) and left-recursion-freed in the second part ( mathex rule ).

could anyone point out where is this going wrong and why am i not getting the desired output?

thanks in advance!

Upvotes: 0

Views: 780

Answers (1)

Sam Harwell
Sam Harwell

Reputation: 99859

With the exception of the action in program, each of your println calls appears immediately before a token reference in the grammar. It's clear that this means they will be executed in the same order that the tokens appear in the file.

Your first mismatch between the expected and actual outputs is the reversal of the lines 16 and 27. Your expected output would only occur if the + token in your input appeared before the 2 token in your input, but clearly you can see that this is not the case. The second mismatch occurs for the same reason; specifically it's due to the fact that the expected output assumed the * token would appear earlier in your grammar than the 3 token.


I noticed that you originally wrote a left-recursive expr rule and included embedded actions within it. The following information is not related to solving your specific question, but it's important to understand in case you decide to uncomment that code and use the left-recursive form of expr.

Consider the following left-recursive rule to allow simple addition of identifiers, with the addition of two embedded actions.

expr
  : {a();} ID
  | {b();} expr '+' ID
  ;

As you probably found out, this syntax will not compile with ANTLR. We found that evaluating the expression {b();} at the location where I showed it here resulted in a tremendous (negative) performance impact on the generated code, so we chose not to allow it. The output would have been the Polish prefix form of the expression, while the parser is actually trying to operate on input using infix notation. The solution is to instead emit the infix notation:

expr
  : {a();} ID
  | expr {b();} '+' ID
  ;

By collecting the results of the calls to a and b, you can convert the results any notation you like before writing the results. The other option is moving the embedded actions to a visitor which executes after the parse is complete, where it is trivial to execute them in any order you like.

Further reading: Infix, Postfix, and Prefix

Upvotes: 1

Related Questions