Junior
Junior

Reputation: 557

Bison - treat expr according to syntax expectations

I am developing a syntax where I use constants, variables and arrays. Arrays can be like name[] or just like name. Each statement use arguments like expr or arrays. At parsing time there is no difference between a variable and an array name without [].

array_empty:
                        args_variable '[' ']';

args_array:
                        array_empty {
                            //special command to put array on app stack
                        }
                        | args_variable {
                            //special command to put array on app stack
                        }
args_variable:
                        MYVARIABLE {
                            //Store variable into a variable table
                        }
                        | '(' args_variable ')'
                        ;
args_Ae:
                        expr ',' args_array
            | '(' args_Ae ')';
expr:                   ....

..................

appsomestmt:  APPSOME expr {
                        //do something - one argument
                        }
                        | APPSOME args_Ae {
                        //do something else - two arguments
                        }
                        ;

A statement can have multiple syntax like:

APPSOME expr
APPSOME array, expr

But if I use statement APPSOME variable in my application Bison reduce variable to args_variable then to args_array even the fact that there is only one argument. I want that program to detect that there is only one argument to this statement and use the first syntax APPSOME expr. Only when statement have 2 arguments like APPSOME array, expr I want to use the second syntax APPSOME array, expr

How can I do this? Remember that arrays have some Actions in Mid-Rule to put array on app stack. So, how can I force Bison to treat variables as variables/arrays according to syntax expectations?

EDIT

%{
%}

%token APPSOME1 APPSOME2
%token <number> APPVARIABLE 

%left '-' '+'
%left '*' '/'
%left '^'

%%

program:        programline programnewline program
                | programline
                ;

programnewline:
                '\n' {
                    //some code
                }
                ;

programline:    compoundstmt
            | /* empty */
            ;

compoundstmt:
                        compoundstmt ':' statement
                        | statement
            ;

array_empty:
                        args_variable '[' ']';

// Array Variable Data
args_array:
                        array_empty {
                            //tell interpretor that next is a array
                        }
                        | args_variable {
                            //tell interpretor that next is a array
                        }
                        ;

args_variable:
                        APPVARIABLE {
                                //put variable into table
                        }
                        | '(' args_variable ')'
                        ;

/* multiple arguments */

args_ee:
                        expr ',' expr
            | '(' args_ee ')';
args_ae:
                        args_array ',' expr
            |'(' args_ae ')';
args_aee:
                        args_array ',' expr ',' expr
            |'(' args_ae ')';

/* list of statements */

statement:
            app1stmt
                        | app2stmt

            ;


app1stmt:  APPSOME1 expr {
                            //code
                        }
                        | APPSOME1 args_ae {
                            //code
                        }
                        ;

app2stmt:  APPSOME2 args_ee {
                            //code
                        }
                        | APPSOME2 args_aee {
                            //code
                        }
                        ;

expr:
            '(' expr ')'
            | expr '+' expr {
                //code
            }
            | args_variable {
                                //variable
            }
            | expr '-' expr {
                //code
            }
            | expr '*' expr {
                //code
            }
            | expr '/' expr {
                //code
            }
            | expr '^' expr {
                //code
            }
            ;

%%

Upvotes: 0

Views: 64

Answers (1)

rici
rici

Reputation: 241861

There are a couple of issues with that grammar.

There isn't really a problem with APPSOME1, which takes either expr or array, expr. As soon as the comma is seen, it is clear which option to choose (but see below about parentheses). However, APPSOME2 is not going to work:

APPSOME2: expr ',' expr
        | array ',' expr ',' expr

Here, it is impossible to disambiguate until the second comma is encountered, and that could be arbitrarily distant from the point at which the parser needs to make a decision (which is before the first comma).

The other complication is the ambiguity created between

args_variable
    : APPVARIABLE
    | '(' args_variable ')'
expr: args_variable
    | '(' expr ')'

When the parser sees the ), it cannot know whether or not the last production might apply because that depends on whether a , follows the parenthesized expression.

Personally, I'd solve that one by just not allowing array arguments to be parenthesized, but you could also solve it by introducing a "parenthesized argument" non-terminal which allows the decision to be deferred until after the last close parenthesis. That would mean that if you had

APPSOME1 (((ARRAYNAME))), expr

that you could avoid performing the necessary action on ARRAYNAME until you hit the last close parenthesis, which seems like it should be the case.

But there is no such simple workaround for the APPSOME2 problem.

Normally, I'd recommend using a GLR parser for a problem like this. Since the grammar is not actually ambiguous, as far as I can see, the GLR algorithm should be able to deal with the fact that the grammar is not LR(k) for any k. But you need to carefully consider the processing of parser actions; in the GLR parsers built by bison, actions are not processed until the parser knows which actions will be needed, which means that the action might not be performed until quite a few more tokens have been read. Normally that doesn't matter, but if you are feeding symbol table information back into your lexical scanner, that won't work.

Of course, there is always the possibility of changing the syntax to make the use of arraynames less ambiguous. For example, you might be able to use a syntax like:

APPSOME1 expr
APPSOME1 array expr
APPSOME2 expr, expr
APPSOME2 array expr, expr

in which the array references don't use commas at all. That might be a bit subtle for your users, but it also might make it easier to read the code, since it provides a visual cue about whether the first thing after the command name is to be used as an array or not.

Upvotes: 1

Related Questions