bbg
bbg

Reputation: 321

Why does this bison code produce unexpected output?

flex code:

  1 %option noyywrap nodefault yylineno case-insensitive
  2 %{
  3 #include "stdio.h"
  4 #include "tp.tab.h"
  5 %}
  6 
  7 %%
  8 "{"             {return '{';}
  9 "}"             {return '}';}
 10 ";"             {return ';';}
 11 "create"        {return CREATE;}
 12 "cmd"           {return CMD;}
 13 "int"           {yylval.intval = 20;return INT;}
 14 [a-zA-Z]+       {yylval.strval = yytext;printf("id:%s\n" , yylval.strval);return ID;}
 15 [ \t\n]
 16 <<EOF>>         {return 0;}
 17 .               {printf("mistery char\n");}
 18 

bison code:

  1 %{
  2 #include "stdlib.h"
  3 #include "stdio.h"
  4 #include "stdarg.h"
  5 void yyerror(char *s, ...);
  6 #define YYDEBUG 1
  7 int yydebug = 1;
  8 %}
  9 
 10 %union{
 11     char *strval;
 12     int intval;
 13 }
 14 
 15 %token <strval> ID
 16 %token <intval> INT
 17 %token CREATE
 18 %token CMD
 19 
 20 %type <strval> col_definition
 21 %type <intval> create_type
 22 %start stmt_list
 23 
 24 %%
 25 stmt_list:stmt ';'
 26 | stmt_list stmt ';'
 27 ;
 28 
 29 stmt:create_cmd_stmt         {/*printf("create cmd\n");*/}
 30 ;
 31 
 32 create_cmd_stmt:CREATE CMD ID'{'create_col_list'}'    {printf("%s\n" , $3);}
 33 ;
 34 create_col_list:col_definition
 35 | create_col_list col_definition
 36 ;
 37 
 38 col_definition:create_type ID ';' {printf("%d , %s\n" , $1, $2);}
 39 ;
 40 
 41 create_type:INT {$$ = $1;}
 42 ;
 43 
 44 %%
 45 extern FILE *yyin;
 46 
 47 void
 48 yyerror(char *s, ...)
 49 {
 50     extern yylineno;
 51     va_list ap;
 52     va_start(ap, s);
 53     fprintf(stderr, "%d: error: ", yylineno);
 54     vfprintf(stderr, s, ap);
 55     fprintf(stderr, "\n");
 56 }
 57 
 58 int main(int argc , char *argv[])
 59 {
 60     yyin = fopen(argv[1] , "r");
 61     if(!yyin){
 62         printf("open file %s failed\n" ,argv[1]);
 63         return -1;
 64     }
 65 
 66     if(!yyparse()){
 67         printf("parse work!\n");
 68     }else{
 69         printf("parse failed!\n");
 70     }
 71 
 72     fclose(yyin);
 73     return 0;
 74 }
 75

test input file:

create cmd keeplive
{
    int a;
    int b;
};

test output:

root@VM-Ubuntu203001:~/test/tpp# ./a.out t1.tp 
id:keeplive
id:a
20 , a;
id:b
20 , b;
keeplive
{
    int a;
    int b;
}
parse work!

I have two questions:

1) Why does the action at line 38 print the token ';'? For instance, "20 , a;" and "20 , b;"

2) Why does the action at line 32 print "keeplive { int a; int b; }" instead of simply "keeplive"?

Upvotes: 2

Views: 426

Answers (1)

rici
rici

Reputation: 241911

Short answer:

yylval.strval = yytext;

You can't use yytext like that. The string it points to is private to the lexer and will change as soon as the flex action finishes. You need to do something like:

yylval.strval = strdup(yytext);

and then you need to make sure you free the memory afterwards.


Longer answer:

yytext is actually a pointer into the buffer containing the input. In order to make yytext work as though it were a NUL-terminated string, the flex framework overwrites the character following the token with a NUL before it does the action, and then replaces the original character when the action terminates. So strdup will work fine inside the action, but outside the action (in your bison code), you now have a pointer to the part of the buffer starting with the token. And it gets worse later, since flex will read the next part of the source into the same buffer, and now your pointer is to random garbage. There are several possible scenarios, depending on flex options, but none of them are pretty.

So the golden rule: yytext is only valid until the end of the action. If you want to keep it, copy it, and then make sure you free the storage for the copy when you no longer need it.

In almost all the lexers I've written, the ID token actually finds the identifier in a symbol table (or puts it there) and returns a pointer into the symbol table, which simplifies memory management. But you still have essentially the same memory management issue with, for example, character string literals.

Upvotes: 7

Related Questions