Paul Baxter
Paul Baxter

Reputation: 1175

Bison ambiguous rule creating assembler grammar

Hello I have a grammar defined for an assembler. I am having trouble resolving the ambiguity.

The ambiguity happens from this

indirect addressing mode for some the opcodes is opcode '(' address')'

For instance code below jmps to the contents of $2AFD

jmp ($2AFD)

I am allowing address to be an expr

but

jsr ($2222 + $23) * 2

is an absolute jump subroutine at $448A

I have a rule where expressions enclosed in '(' ')' are allowed

here are my opcode rules

opcodes:
    OPCODE                                              { $$ = opcode($1,    i, 0);                     }
    | OPCODE '#' expr                                   { $$ = opcode($1,    I, 1, $3);                 }
    | OPCODE expr                                       { $$ = opcode($1,    a, 1, $2);                 }
    | OPCODE expr 'X'                                   { $$ = opcode($1,   ax, 1, $2);                 }
    | OPCODE expr 'Y'                                   { $$ = opcode($1,   ay, 1, $2);                 }
    | OPCODE '(' expr ')'                               { $$ = opcode($1,  ind, 1, $3);                 }
    | OPCODE '(' expr 'X' ')'                           { $$ = opcode($1, zpix, 1, $3);                 }
    | OPCODE '(' expr ')' 'Y'                           { $$ = opcode($1, zpiy, 1, $3);                 }

here are my expression rules

expr:
    INTEGER                                             { $$ = con($1);                                 }
    | SYMBOL                                            { $$ = id($1);                                  }
    | '-' expr %prec UMINUS                             { $$ = opr(UMINUS, 1, $2);                      }
    | math                                              { $$ = $1;                                      }
    | compares                                          { $$ = $1;                                      }
    | '(' expr ')'                                      { $$ = $2;                                      }
    ;

math: 
    expr OR expr                                        { $$ = opr(OR, 2, $1, $3);                      }
    | expr AND expr                                     { $$ = opr(AND, 2, $1, $3);                     }
    | expr '|' expr                                     { $$ = opr('|', 2, $1, $3);                     }
    | expr '&' expr                                     { $$ = opr('&', 2, $1, $3);                     }
    | expr '^' expr                                     { $$ = opr('^', 2, $1, $3);                     }
    | '~' expr                                          { $$ = opr('~', 1, $2);                         }
    | expr '+' expr                                     { $$ = opr('+', 2, $1, $3);                     }
    | expr '-' expr                                     { $$ = opr('-', 2, $1, $3);                     }
    | expr '*' expr                                     { $$ = opr('*', 2, $1, $3);                     }
    | expr '/' expr                                     { $$ = opr('/', 2, $1, $3);                     }
    ;

compares: 
    expr '<' expr                                       { $$ = opr('<', 2, $1, $3);                     }
    | expr '>' expr                                     { $$ = opr('>', 2, $1, $3);                     }
    | expr GE expr                                      { $$ = opr(GE, 2, $1, $3);                      }
    | expr LE expr                                      { $$ = opr(LE, 2, $1, $3);                      }
    | expr NE expr                                      { $$ = opr(NE, 2, $1, $3);                      }
    | expr EQ expr                                      { $$ = opr(EQ, 2, $1, $3);                      }
    ;

My question is how to best resolve the ambiguity.

The default rules work correctly but I do not want 100 sr conflicts and 2 rr conflicts. I don't want to hide the problem with %expect. I would like to use %prec as I do for UMINUS. I am also using flex in this project.

Upvotes: 0

Views: 141

Answers (1)

rici
rici

Reputation: 241691

I don't know a good solution using precedence declarations. It might be possible, but it will require more explanation in comments than the hack is worth.

Here's how I'd do it. Note that I've eliminated the distinction between math and compares because I don't see the value and it just ends up performing a useless unit reduction every time. But if your real code has more complicated actions which actually do something based on the difference, by all means restore it. It's not relevant to resolving the conflicts.

The point here is to avoid ( expr ) from being parsed as an expression unless it is part of an expression. So the top-level expr cannot have redundant parentheses, while subexpressions can be surrounded by parentheses.

expr: INTEGER
    | SYMBOL
    | '-' subexpr %prec UMINUS
    | '~' subexpr %prec UMINUS  /* See note 1 below */
    | subexpr '+' subexpr
    | ...                       /* Everything which was in math */
    | subexpr '<' subexpr
    | ...                       /* Everything which was in compares */
    ;
subexpr:
      expr
    | '(' subexpr ')'
    ;

Now, how you write the opcode rule depends on the precise syntax/semantics you're looking to implement. I'd start with this, which has no conflicts:

opcodes:
      OPCODE
    | OPCODE '#' subexpr
    | OPCODE expr
    | OPCODE subexpr 'X'
    | OPCODE expr 'Y'
    | OPCODE '(' subexpr ')'
    | OPCODE '(' subexpr 'X' ')'
    | OPCODE '(' subexpr ')' 'Y'

Note that I used expr (which cannot start with () for the a and ay rules; that's needed to distinguish them from the ind and zpiy rules, respectively. Since there is no ambiguity between ax and zpix (the X needs to go inside the parentheses in the latter case), it's possible to allow ($2222 + $23) X to be parsed as though it had been written ($2222 + $23 X) but it's quite possible that it should have instead signalled a syntax error.

On the whole, I think that this sort of syntactic near-ambiguity is confusing for human readers and should be avoided, but I'm sure you have your reasons.


Notes:

  1. Normally, all prefix operators have exactly the same precedence. If using precedence declarations, I prefer to mark all prefix operators with the same precedence token. But everyone has their own style. (All postfix operators also have the same precedence, usually, which is higher to prefix operators.)

Upvotes: 2

Related Questions