Ove
Ove

Reputation: 788

"Rule useless in parser" with two operators in a row

I want to write a grammar with an operator like the Matlab colon operator, where "a:b" and "a:b:c" means slightly different things. And I'd prefer the operator to be nonassociative, since "a:b:c:d" etc wouldn't make sense.

Here's a stripped-down version of my grammar, to show how I've tried to do it:

%union {
  int ival;
}

%token tINT

%nonassoc ':'

%%

program: { }
       | expression ';' { }

expression: tINT { }
          | expression ':' expression { }
          | expression ':' expression ':' expression { }

For some reason, bison ignores the second colon rule, with this message:

test.y:16.13-56: warning: rule useless in parser due to conflicts [-Wother]
           | expression ':' expression ':' expression { }
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Why doesn't bison understand what I want here, and how can I change my grammar so it works?

Edit: in case it helps, the output from "bison -v" contains

State 7

    4 expression: expression . ':' expression
    4           | expression ':' expression .
    5           | expression . ':' expression ':' expression
    5           | expression ':' expression . ':' expression

    ':'  error (nonassociative)

    $default  reduce using rule 4 (expression)

though I still don't see why bison shouldn't allow this, nor how to solve it.

Upvotes: 3

Views: 268

Answers (3)

librik
librik

Reputation: 3788

Couldn't you simply make it left-associative with the appropriate precedence, then as part of your action code, throw an error if there's more than 2 colons? That is, don't try to handle this in the grammar, but make it a semantic constraint, enforced in C code.

I'm thinking of something like this -- this is pure pseudocode, mind you, because I have no idea what sort of parse tree you're building, but let's assume you're building an abstract syntax tree --

expression
    : expression ':' expression
    {
        if (Head($1) == DoubleColon)
            yyerror("Three-colon expressions are not allowed.");
        else if (Head($1) == Colon)
            $$ = DoubleColon(element_of($1, 1), element_of($1, 2), $3);
        else
            $$ = Colon($1, $3);
    }

Depending on how you handle parentheses, this may get a little more complicated. (I.e. if you encode them explicitly in the parse tree, you can use the above technique, but otherwise you'll need a way to distinguish (a:b):c from a:b:c.)

This is an interesting problem. I faced it in a parser I was writing, and I was never able to come up with a general solution. In my case, the two colons had different contexts and different precedences. It seems like something a shift-reduce automaton ought to be able to do, but it's not clear how to tell Bison to generate the transition tables you want. Eventually I had to solve it with a lexer hack that calls back into the parser to do a "simulated parse," a technique that Bison 3.0 uses in its LAC feature.

Upvotes: 1

Ove
Ove

Reputation: 788

Ah! After a lot of mucking about, I finally found a way that works and still allows me to use precedence the "right" way, though it requires me to assign a precedence to all terminals that can appear in an expression, if I want to avoid tons of shift/reduce warnings. Also it needs a dummy token to give the single-colon rule an appropriate precedence (a little lower than ':', in order to avoid the understandable shift-reduce conflict).

I'm not completely happy with having to assign precedence to terminals like tINT, since my real grammar has a bunch of such things, but I suppose this will have to do, if nobody has a better idea.

%union {
  int ival;
}

%token tINT

%precedence nCOLON
%nonassoc ':'
%precedence tINT

%%

expression: tINT { }
          | expression_colon expression %prec nCOLON { }
          | expression_colon expression ':' expression { }

expression_colon: expression ':'

Upvotes: 1

Thomas Padron-McCarthy
Thomas Padron-McCarthy

Reputation: 27632

You've specified that the ':' operator is nonassociative. Remove that, and you'll instead get the expected conflicts from A:B:C being parsable as one of A:B:C, (A:B):C and A:(B:C).

But since you want it to be nonassociative, here is one way:

%token tINT

%%

program: { }
        | colonexpression ';' { }
        ;

colonexpression: expression { }
        | expression ':' expression ':' expression { }
        | expression ':' expression { }
        ;

expression: tINT { }
        ;

%%

Upvotes: 2

Related Questions