Hemal
Hemal

Reputation: 75

YACC - strlen of $1 is 0 although string is there

I am having a strange error in my program

The structure of my YYSTYPE is

%union
{
        char *text;
        node *n;
}
%token <text> NUMBER

and the grammar rule is

P:
        NUMBER
        {
                cout<<"$1 : "<<$1<<endl;
                int i = 0;
                while($1[i])
                {
                        cout<<"char : "<<$1[i++]<<endl;
                }
                $<n>$->left = $<n>$->right = NULL;
                char *test1 = new char[strlen($1)];
                strcpy(test1, $1);
                cout<<"len : "<<strlen($1)<<"test1 : "<<test1<<endl;
                char *lolz = strdup($1);
                cout<<"dup : "<<((uint64_t)lolz)<<' '<<((int)lolz[1])<<" : dup"<<endl;
                $<n>$->data = string($1);
                cout<<"nd : "<<$<n>$->data<<endl;
                print_tree($<n>$);
        }
        ;

I can print the contents of $1, but when I do strlen($1), it returns 0 length This is causing the strdup and string initialisation to fail.

Output:

$1 : 65301
char : 6
char : 5
char : 3
char : 0
char : 1
len : 0test1 :
dup : 26935504 0 : dup
Segmentation fault (core dumped)

Am I missing something obvious here?

Upvotes: 0

Views: 245

Answers (1)

rici
rici

Reputation: 241861

When you execute:

$<n>$->left = $<n>$->right = NULL;

what do you suppose the value of $<n>$ is? Have you assigned it to the address of a node object?

To save you some time: you haven't assigned it so you could think of it as an uninitialised pointer; dereferencing an uninitialised pointer is Undefined Behaviour and that corresponds to what you see.

But that analysis is not quite accurate.

The bison-generated parser initialises $$ to $1 prior to executing the action. In this case, $1 is a union whose text member has been assigned to, so using the n member is (a different) UB. The result is the same but in common compilers it is more predictable: I suppose that the left element of a node is at offset 0, so the assignment above overwrites the first 16 bytes of the character string with zeros (8 if you have a 32-bit architecture). That's likely a buffer overrun, but if it doesn't segfault, the end result is that the first byte of $1 is 0, hence the return value of strlen. (When you try to use the data element, it does segfault, apparently, presumably because that is not an initialised std::string. Using a zero-length C-string would not be a problem either for strdup or the std::string constructor.)

Moral: never assign through a pointer if you don't know what it points to.


By the way, the strcpy to test1 is a buffer overrun of one byte. You seem to have gotten away with it this time but it's a bad habit.

Upvotes: 2

Related Questions