Why does not $$ accept type char

Question

I dont understand why this is not working. I tried usind int values with $$, and it worked perfectly, but unless there is another way i want it to be with chars. This is my .l file:

%{
#include "y.tab.h"
%}
%option noyywrap
%option yylineno
%%
DEFINE return DEFINETAG;
BEGIN return BEGINTAG;
END return ENDTAG;
[A-Z]+[0-9] {strcpy(yylval.buf,yytext); return AUT;}
[a-z_]+(0|[1-9][0-9]*)? {strcpy(yylval.buf, yytext); return EST;}
(\{[^}*]*\})* {strcpy(yylval.buf, yytext); return CODC;}
[->;] return yytext[0];
[ 	
] ;
. yyerror("Caracter Inválido");
%%

And this is my .y file:

%{
#include 
#include 
#include 
extern int yylineno;
int r; int c;
%}
%union {char buf[50]; int val;}
%start gda
%token BEGINTAG ENDTAG DEFINETAG AUT EST CODC
%type desta daute dest dtraa dautt dtra
%%
gda  : gda desta dtraa devea {char ma[r][c]; printf("%d, %d
",r,c);}
     |
     ;
desta: DEFINETAG BEGINTAG daute ENDTAG {$$=$3;}
     ;
daute: daute AUT dest {$$=$3;}
     |
     ;
dest : dest EST {r=r+1;$$=$2;}
     | EST {r=r+1;$$=$1;}
     ;
dtraa: DEFINETAG BEGINTAG dautt ENDTAG 
     ;
dautt: dautt AUT dtra 
     |
     ;    
dtra : dtra EST '-''>' EST {c=c+1;}
     | EST '-''>' EST {c=c+1;}
     ;
devea: devea AUT {printf("void %s(){
",$2);} BEGINTAG deve ENDTAG {printf("}
");}
     | AUT {printf("void %s(){
",$1);} BEGINTAG deve ENDTAG {printf("}
");}
     ;
deve : deve est CODC
     | 
     ;
est  : EST '-''>' EST {printf("if(estado==%s)estado=%s;
",$1,$4);}
     |
     ;
%%
int main(){
yyparse();
return 0;
}
int yyerror(char *s){fprintf(stderr, "ERRO(%d):%s
", yylineno,s); return 0;}

This my my error list:

gda2.y: In function ‘yyparse’:
gda2.y:16:12: error: incompatible types when assigning to type ‘char[50]’ from type ‘char *’
gda2.y:18:12: error: incompatible types when assigning to type ‘char[50]’ from type ‘char *’
gda2.y:21:18: error: incompatible types when assigning to type ‘char[50]’ from type ‘char *’
gda2.y:22:18: error: incompatible types when assigning to type ‘char[50]’ from type ‘char *’

rici · Accepted Answer

In C, it's not legal to assign to an array. For example, you can't do this:

 char c[50];
 c = "abc";  /* ILLEGAL */

Putting the array in a union doesn't get around the problem:

 union {
   char c[50];
   int  i;
 } u;
 u.c = "abc";  /* ILLEGAL */

However, oddly, you can assign one struct to another even if they include arrays:

 struct FiftyChars {
   char c[50];
 };

 struct FiftyChars a,b;
 strncpy(a.c, "abc", 49);
 b = a;                /* ¡LEGAL! */

bison and yacc don't let you avoid the rules of C. Since $$ = $2 is actually translated into something like:

yylval.buf = yystack[top - 2].buf;

whatever buf refers to needs to allow direct assignment. So it can't be an array. But it can be a struct whose only member is an array. Which means you could change your %union declaration to

%union {struct {char c[50];} buf; int val;}

and then make appropriate changes throughout your code (such as changing buf to buf.c in your flex input file, and $n to $n.c in your printf's).

Alternatively, you could avoid the issue by just copying the strings with strncpy, and writing

strncpy($$, $1, 49);

instead of

$$ = $1;

in your actions. That should work fine with bison, although I don't know whether every implementation of yacc will handle the default action ($$ = $1) as a union copy (which would be fine) or as a typed copy (which would throw an error).

On the other hand, you might at some point need to ask yourself where the size 50 comes from. Is it absolutely clear that it's big enough? Do you check to make sure that a token is never longer than 49 characters? Is it possible that your code is full of potential buffer overruns?

Once you start thinking this way, you will see that it is better to define your union with a pointer to a dynamically-allocated buffer:

%union {char* buf; int val;}

Now there is no problem with assigning buf in one semantic value's union to a buf in another semantic value, and you can just use strdup in your flex file to initialize the buf field. But you now have another problem, which is that you need to free all those dynamically-allocated names, or live with having a memory leak.

Living with the memory leak is possible, though ugly, if all you are doing is building a small one-pass compiler. After all, when the program terminates, all of its memory will be released without you needing to do anything. That was the traditional style, although I suspect that the vast majority of programmers reading this answer will be outraged by the suggestion.

Unfortunately, it's not so easy to fix. The semantic values are passed from one stack position to another, and there's no easy way to know how many pointers there are to a given string of characters, or when the string is no longer needed. If you were using C++ as a base language instead of C, you could use std::string, which will deal with all these issue as well as the allocation of appropriately large buffers, but at the cost of doing quite a bit of unnecessary string copying. Or you could use shared pointers to std::string's, which would do the reference counting for you, again at a certain run-time cost.

My solution, for some time, has been to maintain a dictionary of "interned" strings -- that is, unique strings, in the lexer, and have the lexer return a pointer to a (const) unique string. That has the cost of a hash-table lookup for every string token, but it effectively deals with the garbage collection problem. At the end of the parse, the lexer and its associated hashtable of unique strings can be simply deleted. (Of course, if a string needs to survive longer than the lexer, it needs to be copied. But in many cases, that's just not a problem.)

Why does not $$ accept type char

Answers (1)

Related Questions