eatonphil
eatonphil

Reputation: 13682

lvalue grammar in yacc reduces incorrectly

I'm trying to parse multidimensional arrays with YACC. Here is my lvalue definition:

    lvalue: ID { EM_debug("got lvalue identifier " + to_String($1));
            $$.My_VAR = A_SimpleVar($$.pos, $1);
            $$.size = 0;
            $$.name = $1;
        }
    | lvalue L_SQUARE_BRACKET exp R_SQUARE_BRACKET { EM_debug("got lvalue[exp]");
            $$.My_VAR = A_SubscriptVar($$.pos, $1.My_VAR, $3.My_AST);
            $$.size = $3.My_AST;
            $$.name = $1.name;
        }
    ;

For the (simplified) input ia[2] it prints got lvalue identifier ia and gives a parsing error when it encounters the left bracket. I don't get why this would not work. It should see the left bracket in its lookahead and shift. It should not reduce immediately like this. How can I prevent it from shifting?

Upvotes: 0

Views: 364

Answers (2)

rici
rici

Reputation: 241721

On the contrary, the reduction is completely correct. In order to apply

lvalue: lvalue L_SQUARE_BRACKET exp R_SQUARE_BRACKET

to the input

ia[2]

the parser needs to make ia into an lvalue before shifting the [ (assuming that L_SQUARE_BRACKET is a [, see below). It does this by using the rule lvalue: ID, so we can expect that rule to run before the [ is shifted.

So that's not the problem, and there's not enough information in the question to provide a better diagnosis. However, for what it's worth, a few notes:

1) Personally, I find it much less error-prone and easier to read to use literal characters in bison rules:

lvalue: lvalue '[' exp ']'

which of course needs to be matched with a flex rule which returns the literal characters:

"["|"]"  { return *yytext; }

(or, using the possibly less readable syntax: [][] which can be extended to a longer list of single character tokens, such as [][(){}<>=+*/-]: just remember that ] must come first and - last in a character class).

It's entirely possible that there is a mismatch between your scanner and your parser which results in the [ not being sent with the correct token type; you certainly need to eliminate that possibility for debugging.

2) Is bison telling you about any conflicts (including shift-reduce conflicts)? Each of these needs to be tracked down and eliminated.

3) How do you know that the syntax error is being generated when the [ is seen? Have you, for example, enabled flex debugging traces (very handy for debugging) and/or bison debugging traces (which I find more useful than scattering print statements in your actions, but YMMV)?

Upvotes: 1

randomusername
randomusername

Reputation: 8097

Don't use YACC for lval vs. rval distinguishing. Because an lval is also almost always an rval, it creates reduce/reduce conflicts in the grammar and that makes it non-deterministic.

Use a Semantic Analysis phase to check for lval correctness rather than incorporating it into the YACC grammar.

For reference though, GNU Bison handles reduce/reduce conflicts by reducing by the rule which is defined first in the file. So that might help you temporarily get around your problem.

Upvotes: 1

Related Questions