Implementation of grammar using Bison. Grammar control flow unexpected

Question

Abstract grammar. Actual grammar follows:

start ->  t1  V1;

V1 -->   t2   V1
       | t3   V2
       ;

V2 -->   t4
       | /* Empty */
       ;

When the control is in V2 and token t3 is encountered. The control goes back to V1. All right to this point.

But the control doesn't stop at V1 to trigger t3, but goes back to start, this is the problem.

The actual grammar. Explanation follows.

%{
#include 
#include 
#include 

void yyerror (char *s); 
extern FILE *yyin;

%}

%token MAIN   

%token INT_DT INT_LIT   /* t_INT_DT --> Integer Data type, i.e. "int"
                         * t_INT_LIT --> An integer, i.e. "5".
                         */

%token VAR              // A variable

%token SCANF  PRINTF    // For the printf and the scanf statements.

%token IF     ELSE      // For the if and else statements.
%token GOTO

%token OPRTR            // For the operator in an expression.

/* Begining of a basic block, BBS, Basic Block Statement.
 * BBS_ST = ":"
 * BBS_END2   ">;"
 * Example:   
 * :
 *      Statements;
 *      goto ;
 */
%token BBS_ST   // ":" 
%token BBS_END2 // ">;" 

// Opening and closing Braces, i.e. { & }
%token OP_BR  // "{" 
%token CL_BR  // "}"

// For opening and closing Parantheses, ( & )
%token OP_PR  //"("  
%token CL_PR  //")"

%token EQUAL  // "="
%token COMMA  // ","       
%token NBSP   // "&"
%token SM_CLN // ";"       // For ";" 

%token RETURN           // For the return statement.

%start begin

%%

begin:   MAIN    main   ;

main:    OP_BR    functn   CL_BR   ;

functn:   INT_DT   VAR      SM_CLN   functn  /*Recursion to stay in functn */
        | BBS_ST   INT_LIT  BBS_END1  bb     
        ;

bb:    SCANF    var_List   CL_PR    SM_CLN  bb
    |  PRINTF   var_List   CL_PR    SM_CLN  bb
    |  VAR      EQUAL      expr     SM_CLN  bb
    |  IF       OP_PR      expr     CL_PR   bb
    |  ELSE     bb
    |  GOTO     BBS_ST     INT_LIT  BBS_END2
    |  RETURN   SM_CLN    
    |  /* Empty Rule. */  
    ;  /* bb has can't catch BBS_ST as first token, thus it must return
        * to the calling rule and functn can catch BBS_ST. */

var_List:    COMMA   VAR   var_List 
          |  /* Empty Rule. */ 
          ;

expr:     VAR      expr2 
      |   INT_LIT  expr2  
      ;

expr2:    OPRTR    expr3  
       |  /* Empty Rule */
       ;

expr3:    VAR     
       |  INT_LIT  
       ;

%%
int main (int argc, char *argv[])
{
    #if YYDEBUG == 1
        extern int yydebug;
        yydebug = 1;
    #endif

    if (argc == 2)
    {
        yyin = fopen (argv [1], "r");
        if (yyin == NULL)
            perror (argv [1]); 
    }

    yyparse ();

    fclose (yyin);
    return 0;    
}


void yyerror (char *s)
{
    printf ("Error: %s\n", s);
}

Implementation of ideas suggested in answers till now, but this doesn't help.

begin:  MAIN main ; 

main:   OP_BR   functn_Decls   CL_BR 
        ; 

functn_Decls:   functn_Decls INT_DT VAR SM_CLN 
              | functn 
              ; 

functn:    functn BBS_ST INT_LIT BBS_END1 
        |  bb 
        ; 

bb:    bb   stmnt 
    |  /* Empty Rule */ 
    ; 

stmnt:   t1  // terminals only. 
       ; 

var_List:   t2 // terminal just for illustration. 
            ;

expr:  t3 // terminal just for illustration.
       ;

Error causing input:

main ()
{
  :
  goto ;

  :
}

Error:

At the token BBS_ST is triggered. At this time the control is in bb, like bb --> Waiting for a token Since no rule in bb starts with BBS_ST, it returns to the calling rule.



Now the calling rule is functn, which has a rule starting with BBS_ST.
Problem is that, this rule is not invoke.

Instead the control reaches the parent of functn at P1.
If at P1 I add BBS_ST, It catches this token.

Please point out why is it happening like this?

Corresponding Flex file:

%{
    // Include the file
    #include "Parse_One_Line2.tab.h"
    extern YYSTYPE  yylval;
    extern char    *yytext;
%}

LB   ^[ \t]*
CH   [a-zA-Z]
DI   [0-9]

%%

{LB}main[ ]*$[.]*$    return MAIN; // the main() call.

{LB}int[ ]              return INT_DT;

[+-]?{CH}({CH}{DI}_)*   return VAR; 

[+-]?[0-9]+             return INT_LIT;  

{LB}scanf[ ]?\(".*"   return SCANF;

{LB}printf[ ]?\(".*"  return PRINTF;

{LB}if[ ]               return IF; // If statement.

{LB}else[ ]             return ELSE; // If statement.

{LB}goto[ ]             return GOTO;

{LB}return              return RETURN;

\{CLOBBER\};            ;// Ignore it.

":"                    return BBS_END1; // token Greater than Colon.

">;"                    return BBS_END2;

^;;[^\n]*[\n]           ; // Ignore it, comment.

"+"|"-"|"*"|"/"|"%"|"=="|">="|"<="|"<"|">"|"!=" return OPRTR; 

^[{]                    return OP_BR; // It is "{".

^[}]                    return CL_BR; // It is "}".

"("                     return OP_PR; // Paranthesis Open

")"                     return CL_PR; // Paranthesis Close


";"                     return SM_CLN;

"="                     return EQUAL;

"&"                     return NBSP;

","                     return COMMA;

[\n]                    ; /* Skip these characters. */

.                       ; /* Skip.   */

%%


int yywrap (void)
{
    return 1;
}


The Output file using YYDEBUG flag.

Starting parse
Entering state 0
Reading a token: Next token is token MAIN ()
Shifting token MAIN ()
Entering state 1
Reading a token: Next token is token OP_BR ()
Shifting token OP_BR ()
Entering state 3
Reading a token: Next token is token INT_DT ()
Shifting token INT_DT ()
Entering state 6
Reading a token: Next token is token VAR ()
Shifting token VAR ()
Entering state 9
Reading a token: Next token is token SM_CLN ()
Shifting token SM_CLN ()
Entering state 12
Reading a token: Next token is token BBS_ST ()
Shifting token BBS_ST ()
Entering state 7
Reading a token: Next token is token INT_LIT ()
Shifting token INT_LIT ()
Entering state 10
Reading a token: Next token is token BBS_END1 ()
Shifting token BBS_END1 ()
Entering state 13
Reading a token: Next token is token GOTO ()
Shifting token GOTO ()
Entering state 20
Reading a token: Next token is token BBS_ST ()
Shifting token BBS_ST ()
Entering state 29
Reading a token: Next token is token INT_LIT ()
Shifting token INT_LIT ()
Entering state 38
Reading a token: Next token is token BBS_END2 ()
Shifting token BBS_END2 ()
Entering state 47
Reducing stack by rule 10 (line 76):
   $1 = token GOTO ()
   $2 = token BBS_ST ()
   $3 = token INT_LIT ()
   $4 = token BBS_END2 ()
-> $$ = nterm bb ()
Stack now 0 1 3 6 9 12 7 10 13
Entering state 22
Reducing stack by rule 4 (line 68):
   $1 = token BBS_ST ()
   $2 = token INT_LIT ()
   $3 = token BBS_END1 ()
   $4 = nterm bb ()
-> $$ = nterm functn ()
Stack now 0 1 3 6 9 12
Entering state 14
Reducing stack by rule 3 (line 67):
   $1 = token INT_DT ()
   $2 = token VAR ()
   $3 = token SM_CLN ()
   $4 = nterm functn ()
-> $$ = nterm functn ()
Stack now 0 1 3
Entering state 8
Reading a token: Next token is token BBS_ST ()
Error: popping nterm functn ()
Stack now 0 1 3
Error: popping token OP_BR ()
Stack now 0 1
Error: popping token MAIN ()
Stack now 0
Cleanup: discarding lookahead token BBS_ST ()
Stack now 0

Chris Hunt · Accepted Answer

I reorganized a few of your rules. The problem is that when functn sees a bb, it expects only one. As a result when your GOTO is reached no further tokens are expected. Allowing functns to appear after a bb statement should fix this and give you the behavior you're looking for.

functn: INT_DT VAR ";" functn
      | bb_stmt functn
      | bb_stmt
      ;

bb_stmt: BBS_ST /*P2*/ INT_LIT ">:" bb
       ;

If you want to continue to force statements to be only at the top of the function, you can do something like this in addition to the above:

main:    "{"     functn_decls /*P1*/   "}"   ;

functn_decls: INT_DT VAR ";" functn_decls
            | functn
            ;

functn: bb_stmt functn
      | bb_stmt
      ;

Implementation of grammar using Bison. Grammar control flow unexpected

Answers (2)

Related Questions