Reputation: 1175
Hello I have a grammar defined for an assembler. I am having trouble resolving the ambiguity.
The ambiguity happens from this
indirect addressing mode for some the opcodes is opcode '(' address')'
For instance code below jmps to the contents of $2AFD
jmp ($2AFD)
I am allowing address to be an expr
but
jsr ($2222 + $23) * 2
is an absolute jump subroutine at $448A
I have a rule where expressions enclosed in '(' ')' are allowed
here are my opcode rules
opcodes:
OPCODE { $$ = opcode($1, i, 0); }
| OPCODE '#' expr { $$ = opcode($1, I, 1, $3); }
| OPCODE expr { $$ = opcode($1, a, 1, $2); }
| OPCODE expr 'X' { $$ = opcode($1, ax, 1, $2); }
| OPCODE expr 'Y' { $$ = opcode($1, ay, 1, $2); }
| OPCODE '(' expr ')' { $$ = opcode($1, ind, 1, $3); }
| OPCODE '(' expr 'X' ')' { $$ = opcode($1, zpix, 1, $3); }
| OPCODE '(' expr ')' 'Y' { $$ = opcode($1, zpiy, 1, $3); }
here are my expression rules
expr:
INTEGER { $$ = con($1); }
| SYMBOL { $$ = id($1); }
| '-' expr %prec UMINUS { $$ = opr(UMINUS, 1, $2); }
| math { $$ = $1; }
| compares { $$ = $1; }
| '(' expr ')' { $$ = $2; }
;
math:
expr OR expr { $$ = opr(OR, 2, $1, $3); }
| expr AND expr { $$ = opr(AND, 2, $1, $3); }
| expr '|' expr { $$ = opr('|', 2, $1, $3); }
| expr '&' expr { $$ = opr('&', 2, $1, $3); }
| expr '^' expr { $$ = opr('^', 2, $1, $3); }
| '~' expr { $$ = opr('~', 1, $2); }
| expr '+' expr { $$ = opr('+', 2, $1, $3); }
| expr '-' expr { $$ = opr('-', 2, $1, $3); }
| expr '*' expr { $$ = opr('*', 2, $1, $3); }
| expr '/' expr { $$ = opr('/', 2, $1, $3); }
;
compares:
expr '<' expr { $$ = opr('<', 2, $1, $3); }
| expr '>' expr { $$ = opr('>', 2, $1, $3); }
| expr GE expr { $$ = opr(GE, 2, $1, $3); }
| expr LE expr { $$ = opr(LE, 2, $1, $3); }
| expr NE expr { $$ = opr(NE, 2, $1, $3); }
| expr EQ expr { $$ = opr(EQ, 2, $1, $3); }
;
My question is how to best resolve the ambiguity.
The default rules work correctly but I do not want 100 sr conflicts and 2 rr conflicts. I don't want to hide the problem with %expect. I would like to use %prec as I do for UMINUS. I am also using flex in this project.
Upvotes: 0
Views: 141
Reputation: 241691
I don't know a good solution using precedence declarations. It might be possible, but it will require more explanation in comments than the hack is worth.
Here's how I'd do it. Note that I've eliminated the distinction between math
and compares
because I don't see the value and it just ends up performing a useless unit reduction every time. But if your real code has more complicated actions which actually do something based on the difference, by all means restore it. It's not relevant to resolving the conflicts.
The point here is to avoid ( expr )
from being parsed as an expression unless it is part of an expression. So the top-level expr
cannot have redundant parentheses, while subexpressions can be surrounded by parentheses.
expr: INTEGER
| SYMBOL
| '-' subexpr %prec UMINUS
| '~' subexpr %prec UMINUS /* See note 1 below */
| subexpr '+' subexpr
| ... /* Everything which was in math */
| subexpr '<' subexpr
| ... /* Everything which was in compares */
;
subexpr:
expr
| '(' subexpr ')'
;
Now, how you write the opcode rule depends on the precise syntax/semantics you're looking to implement. I'd start with this, which has no conflicts:
opcodes:
OPCODE
| OPCODE '#' subexpr
| OPCODE expr
| OPCODE subexpr 'X'
| OPCODE expr 'Y'
| OPCODE '(' subexpr ')'
| OPCODE '(' subexpr 'X' ')'
| OPCODE '(' subexpr ')' 'Y'
Note that I used expr
(which cannot start with (
) for the a
and ay
rules; that's needed to distinguish them from the ind
and zpiy
rules, respectively. Since there is no ambiguity between ax
and zpix
(the X
needs to go inside the parentheses in the latter case), it's possible to allow ($2222 + $23) X
to be parsed as though it had been written ($2222 + $23 X)
but it's quite possible that it should have instead signalled a syntax error.
On the whole, I think that this sort of syntactic near-ambiguity is confusing for human readers and should be avoided, but I'm sure you have your reasons.
Upvotes: 2