Reputation: 788
I want to write a grammar with an operator like the Matlab colon operator, where "a:b" and "a:b:c" means slightly different things. And I'd prefer the operator to be nonassociative, since "a:b:c:d" etc wouldn't make sense.
Here's a stripped-down version of my grammar, to show how I've tried to do it:
%union {
int ival;
}
%token tINT
%nonassoc ':'
%%
program: { }
| expression ';' { }
expression: tINT { }
| expression ':' expression { }
| expression ':' expression ':' expression { }
For some reason, bison ignores the second colon rule, with this message:
test.y:16.13-56: warning: rule useless in parser due to conflicts [-Wother]
| expression ':' expression ':' expression { }
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Why doesn't bison understand what I want here, and how can I change my grammar so it works?
Edit: in case it helps, the output from "bison -v" contains
State 7
4 expression: expression . ':' expression
4 | expression ':' expression .
5 | expression . ':' expression ':' expression
5 | expression ':' expression . ':' expression
':' error (nonassociative)
$default reduce using rule 4 (expression)
though I still don't see why bison shouldn't allow this, nor how to solve it.
Upvotes: 3
Views: 268
Reputation: 3788
Couldn't you simply make it left-associative with the appropriate precedence, then as part of your action code, throw an error if there's more than 2 colons? That is, don't try to handle this in the grammar, but make it a semantic constraint, enforced in C code.
I'm thinking of something like this -- this is pure pseudocode, mind you, because I have no idea what sort of parse tree you're building, but let's assume you're building an abstract syntax tree --
expression
: expression ':' expression
{
if (Head($1) == DoubleColon)
yyerror("Three-colon expressions are not allowed.");
else if (Head($1) == Colon)
$$ = DoubleColon(element_of($1, 1), element_of($1, 2), $3);
else
$$ = Colon($1, $3);
}
Depending on how you handle parentheses, this may get a little more complicated. (I.e. if you encode them explicitly in the parse tree, you can use the above technique, but otherwise you'll need a way to distinguish (a:b):c
from a:b:c
.)
This is an interesting problem. I faced it in a parser I was writing, and I was never able to come up with a general solution. In my case, the two colons had different contexts and different precedences. It seems like something a shift-reduce automaton ought to be able to do, but it's not clear how to tell Bison to generate the transition tables you want. Eventually I had to solve it with a lexer hack that calls back into the parser to do a "simulated parse," a technique that Bison 3.0 uses in its LAC feature.
Upvotes: 1
Reputation: 788
Ah! After a lot of mucking about, I finally found a way that works and still allows me to use precedence the "right" way, though it requires me to assign a precedence to all terminals that can appear in an expression, if I want to avoid tons of shift/reduce warnings. Also it needs a dummy token to give the single-colon rule an appropriate precedence (a little lower than ':', in order to avoid the understandable shift-reduce conflict).
I'm not completely happy with having to assign precedence to terminals like tINT, since my real grammar has a bunch of such things, but I suppose this will have to do, if nobody has a better idea.
%union {
int ival;
}
%token tINT
%precedence nCOLON
%nonassoc ':'
%precedence tINT
%%
expression: tINT { }
| expression_colon expression %prec nCOLON { }
| expression_colon expression ':' expression { }
expression_colon: expression ':'
Upvotes: 1
Reputation: 27632
You've specified that the ':' operator is nonassociative. Remove that, and you'll instead get the expected conflicts from A:B:C being parsable as one of A:B:C, (A:B):C and A:(B:C).
But since you want it to be nonassociative, here is one way:
%token tINT
%%
program: { }
| colonexpression ';' { }
;
colonexpression: expression { }
| expression ':' expression ':' expression { }
| expression ':' expression { }
;
expression: tINT { }
;
%%
Upvotes: 2