Ayumu Kasugano
Ayumu Kasugano

Reputation: 440

Resolving messy precedence in Bison Parser

I'm currently building a parser for my own language but I'm having trouble with precedence (I think that's the problem). I only discovered this issue while I was constructing the AST. I defined the following tokens:

%token T_SEMICOLON T_COMMA T_PERIOD T_RETURN T_BOOLEAN T_EXTENDS T_TRUE T_FALSE T_IF T_DO T_NEW T_ELSE T_EQUALSSIGN T_PRINT T_INTEGER T_NONE T_WHILE T_NUMBER T_VARIABLE T_LEFTCURLY T_RIGHTCURLY T_LEFTPAREN T_RIGHTPAREN T_ARROW T_EOF

%left T_OR
%left T_AND
%left T_MINUS T_EQUALSWORD T_PLUS 
%left T_GE T_GEQ
%left T_MULTIPLY T_DIVIDE
%right T_NOT

(only the ones with precedence really matter in this case) and I have a grammar that takes the form

Something: T_PRINT expression T_SEMICOLON

where I have several expression productions, some of them being

expression        : expression T_PLUS expression         {$$ = new PlusNode($1, $3);}                             
                  | expression T_MINUS expression        {$$ = new MinusNode($1, $3);} 
                  | expression T_EQUALSWORD expression   {$$ = new EqualNode($1, $3);}          
                  | expression T_MULTIPLY expression     {$$ = new TimesNode($1, $3);}           
                  | expression T_DIVIDE expression       {$$ = new DivideNode($1, $3);} 
                  | T_NOT expression                     {$$ = new NotNode($2);}                               
                  | T_MINUS expression                   {$$ = new NegationNode($2);}                
                  | T_VARIABLE                           {$$ = new VariableNode($1);} 
                  | T_NUMBER               {$$ = new IntegerLiteralNode($1);}                         
                  | T_TRUE                 {$$ = new BooleanLiteralNode(new IntegerNode(1));}                                      
                  | T_FALSE                {$$ = new BooleanLiteralNode(new IntegerNode(0));}                         
                  | T_NEW T_VARIABLE       {$$ = new NewNode($2, NULL);}

when I try to parse something like print true equals new c0 - new c0; I get a strange output AST that looks like Print(Minus(Equal(BooleanLiteral(1), New("c0")), New("c0"))). Here it looks like the T_EQUALSWORD token has higher precedence than the T_MINUS token even though I defined it otherwise?

This problem also occurs if I change the minus to a plus; inputting print true equals new c0 + new c0; I get Print(Equal(BooleanLiteral(1), GreaterEqual(New("c0"), New("c0")))) as output. The form seems correct but I for some reason am getting an T_GEQ token instead of an T_PLUS?

Interestingly, this parses correctly for T_MULTIPLY and T_DIVIDE instead:

Input: print true equals new c0 * new c0;

Output: Print(Equal(BooleanLiteral(1), Times(New("c0"), New("c0"))))

Input: print true equals new c0 / new c0;

Output: Print(Equal(BooleanLiteral(1), Divide(New("c0"), New("c0"))))

So this seems to work correctly with multiply and divide but is failing with plus and minus. Any ideas?

EDIT: I added %prec T_OR to the T_EQUALSWORD production and this solved the problem when I use subtraction, but I still get the weird T_GEQ token when I use addition.

Upvotes: 0

Views: 160

Answers (1)

rici
rici

Reputation: 241901

This line

%left T_MINUS T_EQUALSWORD T_PLUS 

Means the three operators have the same precedence and associate left to right. The first result you get are consistent with that.

a - b - c     (a - b) - c     # Left associative
a - b = c     (a - b) = c     # Also left associative
a = b - c     (a = b) - c     # Once again

If you want equality to be of lower precedence, give it its own precedence level.

The second problem, with the wrong token being produced, is probably because your scanner produces the wrong token.

Upvotes: 3

Related Questions