Yao0512
Yao0512

Reputation: 21

strcat implementation using pointers

Could anyone know that when I write in this way, the program crashes.

#include<stdio.h>
#include<stdlib.h>

void mystrcat(char *s,char *t) {
    while(*s++);
    s--;
    while(*s++ = *t++);
}

int main(void) {
    int size = 1024;
    char *s1, *s2;

    s1 = (char *)malloc(size);
    //s1[0] = '\0';   ********NOTE THIS********
    s2 = (char *)malloc(size);
    //s2[0] = '\0';   ********NOTE THIS********
    mystrcat( s1, "Hello " );
    mystrcat( s2, "World" );
    mystrcat( s1, s2 );
    printf( "\"%s\"\n", s1 );
    return 0;
}

But strangely, when I do not use those two "//" comments, it works!!! So why adding those simple s2[0] = '\0'; could make this program work.

Upvotes: 2

Views: 769

Answers (6)

haccks
haccks

Reputation: 106012

The statement

while(*s++);  

make no sense for first two calls of mystrcat and invoke undefined behavior. You should not read an uninitialized memory.

You need to chech whether the first argument passed is an empty string or not

void mystrcat(char *s,char *t) {
    if(strlen(s))
    {
        while(*s++);
        s--;
    }
    while(*s++ = *t++);
}

Upvotes: -1

jas
jas

Reputation: 69

As the other answers have said, you need to initialize that memory. You can do that in more than one way, but one way would be to use calloc instead of malloc. If you change these two lines:

s1 = (char *)malloc(size);    
s2 = (char *)malloc(size);

to:

s1 = calloc(size,sizeof(*s1));   
s2 = calloc(size,sizeof(*s2));

Your program will run.

Upvotes: 1

Sourav Ghosh
Sourav Ghosh

Reputation: 134326

The returned pointer by malloc() is not guaranteed to be 0-filled, (or, initialized to any value, at all, for that matter). So other than the s1[0] = '\0'; part, while(*s++); may not be doing what you're expecting.

Without the initial zeroing part, while(*s++); cannot prevent the read-before-write scenario.

It is undefined behavior because of

  1. Reading unitialized memory location (indeterminate value)
  2. Going past the allocated memory, in search of terminating null.

In this case, as pointed out by Mr. Peter in the comments, however, the first point itself causes the UB and there is no guarantee that it will reach to the second point. However, in some other scenario, even if the memory is initialized but not null-terminated, you'll hit the second point to invoke the UB.

Upvotes: 3

Born2Smile
Born2Smile

Reputation: 2954

When you call malloc, you receive your char* to some memory. You own size bytes of that memory. But malloc does not prepare the memory you now own in any way. It is left in whatever state the previous owning process left it in. Thus, it could very likely be the case that the memory you receive already contains a string that is longer than size.

Thus when you begin your strcat and run to the end of the first string it will run past the length of size, and attempt to start writing to this memory. Here the problem arises, because you don't own the memory at that location, and thus the program segfaults.

On the other hand, if you initialise the strings by letting the first byte be "\0", you are in effect setting the length of the string to 0 (because there are 0 bytes before the end token: "\0"). Thus when you begin your strcat, it will again run to the end of the first string, but this time the end is within size.

Beware you may still run into problems if the combined string would be longer than size.

Upvotes: 0

Viktor Simk&#243;
Viktor Simk&#243;

Reputation: 2627

In C every string is terminated by the '\0' character.
malloc just allocates the memory, it doesn't writes the '\0' for You.
If You don't add it the program won't know where is the end of the string, and propably will try to read some memory after the actual string, that is not allocated, so it will cause undefined behaviour.

Here actually the mystrcat function increments the pointer until it points to a '\0' character or a 0.
But if there's no 0 found in the allocated memory, then after the next incrementation of the pointer, it will point to some unallocated memory.
Dereferencing it now will cause undefined behaviour.

Upvotes: 1

Some programmer dude
Some programmer dude

Reputation: 409176

When you allocate memory, either through the old C malloc function or the C++ new operator, that memory is not initialized in any way. Reading that memory as it were initialized leads to undefined behavior, and undefined behavior (or UB as it's often shortened) is one of the main reasons for crashes.

Upvotes: 4

Related Questions