Katia
Katia

Reputation: 729

yacc output help

if I have this grammer for example

start : TKN id '{' '}' {cout<<$2<<endl;} ;

shouldn't the output be aaa?? sometimes , it prints } and 0 and sometimes it just hangs

I want to get the value of id ,how to get it right ??

lex.l

%{
  #include "yacc.hpp"
  #include <math.h>
  #include<iostream>
  #include<string>
  int rows = 1,tmp=0;
%}
Id          [a-zA-Z_][0-9a-zA-Z_]*
%x String ...
%option c++
%option noyywrap
%% 
{Id}        {strcpy(yylval.strVal,yytext);cout<<"lex= "<<yytext<<endl;return Id;}//output line 1
...

yacc.y

%output ="yacc.cpp"
%defines
%verbose
%token Id
%token NAMESPACE
%union{

int           iVal;
float         fVal;
char*         strVal;
class Symbol* symPtr;
class NS*     nsPtr;
};


%token <iVal> INT_VAL;
%token <fVal> F_VAL;
%token <strVal> STR_VAL INT FLOAT STRING Id ;
%type <nsPtr> namespacedecl  
%type <symPtr> ns_closer
%type <strVal> Q_ID

//-------------------------------------------
namespacedecl  : NAMESPACE Q_ID '{' '}'            {cout<<"ns= "<<$2<< endl ;} // output line 3
               | NAMESPACE Q_ID '{' typedecl_closer '}' ;

 Q_ID :          Q_ID '.' Id                       {cout<<$3<< endl ;$$ = $3;}
               | Id                                {$$ = $1;cout<<"qid="<<$$<<endl;} // output line 2

of course the files are bigger than this but copy/paste everything will make u get lost ^^

if there's a way to attach the file please tell me cuz I'm still newbie here and it would be much easier than copy/paste.

and this is the what I get when running :

enter image description here

thanks for responding

Upvotes: 0

Views: 905

Answers (2)

Chris Dodd
Chris Dodd

Reputation: 126488

This depends heavily on what your lexer returns as yylval for the id token, as that is what get copied onto the parser stack when the token is shifted, and thus what $2 refers to when the rule is evaluated. At a guess, you have a lex rule something like:

[a-zA-Z_][a-zA-Z_0-9]*      { yylval.str = yytext; return id; }

with

%token<str>    id

in your parser. In this case, you're storing a pointer into flex's internal scanner buffer which contains 'aaa' at that instant, but will be overwritten by a later token, so by the time the action runs, the pointer in $2 points at something else. You need to copy the string somewhere where it won't be overwritten and set yylval to point at that. You could use strdup(3) to copy the string to a malloced buffer, which would solve this problem but might leave you with memory leaks.

edit

Well with your additional info, its quite surprising that your program doesn't crash -- you strcpy the token text into yylval.strVal, but you never initialize yylval.Strval to point anywhere, so you're copying it into some random memory location. You need to stick something like yylval.strVal = malloc(strlen(yytext)+1); into the lex.l action to ensure it points at valid memory, or just use the simpler and equivalent call to strdup, as that combines the malloc and strcpy:

[a-zA-Z_][a-zA-Z_0-9]*      { yylval.str = strdup(yytext); return id; }

Upvotes: 0

Loki Astari
Loki Astari

Reputation: 264669

It depends on what the 'id' rule returned!

%%
start :             TKN id '{' '}' {cout<<$2<<endl;} ;

id :                ID             { return "XXXXX"; } // What is returned here
                                                       // Is what will be printed out by $2 

%%

Note:
The convention is that terminal tokens are all-caps (TKN). While non-terminal tokens are lowercase (id). By this convention I would expect id to have a rule on how it is expanded.

What I suspect you are doing is:

%%

id :                ID             { return yytext; }

%%

Which is a pointer into the lex buffer. This is a volatile buffer. You can NOT rely on its content remaining unchanged (nor can you rely on it being '\0' terminated). What you need to do is make a copy of the token at the point you identify it.

%%

id :                ID             { return strndup(yytext, yylen); }

%%

Edit based on new input:

In these lines:

{Id}        {   strcpy(yylval.strVal,yytext);
                cout<<"lex= "<<yytext<<endl;
                return Id;
            }

The strcpy() is probably bad.

  • Have you reserved any space for the copy?
  • You can not rely on yytext being '\0' terminated.

The cout is dangerious as you can not rely on yytext being '\0' terminated.

I would do:

{ID}        {   cout << "lex=" << std::string(yytext,yytext+yylen); // use yylen
                return Id;
            }

Do not mess around with yacc structures in lex. It tightly couples your lex file to yacc (which is not not necessary). Just return the token. yacc can then get the token value manually.

Then in the YACC File:

Q_ID :        Q_ID '.' ident        {$$ = $3; cout<<"Q.id="<<$$<<endl;}
           |  ident                 {$$ = $1; cout<<"ID  ="<<$$<<endl;}

ident :       Id                    {$$ = strndup(yytext, yylen);}

For every terminal (Id) that has a long token have a non terminal for de-coding the terminal and generate the correct value in the yacc union structure. In this case we have the ident not terminal. It just decodes the Id terminal and gets the token set up correctly (this Id does not have a type in the union structure).

Also Note:

  • It is traditional for terminals to be all caps. Id fails.
  • It is traditional for non-terminals to be lowercase. Q_ID fails.

Additionally:

This line:

%token <strVal> STR_VAL INT FLOAT STRING Id ;

Looks wrong, though it is hard to tell without knowing what INT, FLOAT and STRING are. I am guessing these are the terminal tokens for the keywords int,float,string. In which case you do not need to store the actual token string. The fact that you know it is INT/FLOAT or STRING is enough.

This should be:

%token <strVal> STR_VAL;
%token <strVal> ident;

Upvotes: 1

Related Questions