Reputation: 27
I am fairly new to Yacc and Lex programming but I am training myself with a analyser for C programs.
However, I am facing a small issue that I didn't manage to solve.
When there is a declaration for example like int a,b;
I want to save a and b in an simple array. I did manage to do that but it saving a bit more that wanted.
It is actually saving "a," or "b;" instead of "a" and "b".
It should have worked as $1
should only return tID
which is a regular expression recognising only a string chain. I don't understand why it take the coma even though it defined as a token. Does anyone know how to solve this problem ?
Here is the corresponding yacc declarations :
Declaration :
tINT Decl1 DeclN
{printf("Declaration %s\n", $2);}
| tCONST Decl1 DeclN
{printf("Declaration %s\n", $2);}
;
Decl1 :
tID
{$$ = $1;
tabvar[compteur].id=$1; tabvar[compteur].adresse=compteur;
printf("Added %s at adress %d\n", $1, compteur);
compteur++;}
| tID tEQ E
{$$ = $1;
tabvar[compteur].id=$1; tabvar[compteur].adresse=compteur;
printf("Added %s at adress %d\n", $1, compteur);
pile[compteur]=$3;
compteur++;}
;
DeclN :
/*epsilon*/
| tVIR Decl1 DeclN
And the extract of the Lex file :
separateur [ \t\n\r]
id [A-Za-z][A-Za-z0-9_]*
nb [0-9]+
nbdec [0-9]+\.[0-9]+
nbexp [0-9]+e[0-9]+
"," { return tVIR; }
";" { return tPV; }
"=" { return tEQ; }
{separateur} ;
{id} { yylval.str = yytext; return tID; }
{nb}|{nbdec}|{nbexp} { yylval.nb = atoi(yytext); return tNB; }
%%
int yywrap() {return 1;}
Upvotes: 0
Views: 306
Reputation: 126488
The problem is that yytext
is a reference into lex's token scanning buffer, so it is only valid until the next time the parser calls yylex
. You need to make a copy of the string in yytext
if you want to return it. Something like:
{id} { yylval.str = strdup(yytext); return tID; }
will do the trick, though it also exposes you to the possibility of memory leaks.
Also, in general when writing lex/yacc parsers involving single character tokens, it is much clearer to use them directly as charcter constants (eg ','
, ';'
, and '='
) rather than defining named tokens (tVIR
, tPV
, and tEQ
in your code).
Upvotes: 1