How could creating a string change the value pointed to by a const char*?

Question

I've written a function that takes a string and returns a const char * which contains an encoded version of that string. I call this function, and then create a new string. In doing so, I am somehow inadvertently changing the value pointed to my const char *, something which I thought was impossible.

However, when I don't use my own function, but just hardcode a value into my const char array, the value does not change when I create a string. Why is there a difference here, and why would I be able to change the value of a const char array anyways?

#include 
#include 
#include 
#include 
#include 

using namespace std;

// returns "@username@FIN"
const char* encodeUsername(string username)
{
    username = "@" + username + "@FIN";
    return username.c_str();
}

int main(void)
{
    string jack("jack");
    const char* encodedUsername = "@jack@FIN";
    string dummy("hi");
    printf("%s
", encodedUsername); //outputs "@jack@FIN", as expected.

    string tim("tim");
    const char* encodedUsername2 = encodeUsername(tim);
    string dummy2("hi");
    printf("%s
", encodedUsername2); //outputs "hi". Why?
}

Ishamael · Accepted Answer

To understand why this happens you need to understand several intrinsic properties of C++.

In C++ pointers can point to areas of memory that were freed up. This is something you cannot do in many other languages, and it can hide some severe errors. For example, consider the following code:

char* moo()
{
    char* a = new char[20];
    strcpy(a, "hello");
    delete[] a;
    return a;
}

Note that even though I just deleted a, I can return a pointer to it. The calling side will receive that pointer and will have no idea that it points to a freed-up memory. Moreover, if you immediately print the value of the returned value, you will very likely see "hello", because delete usually does not zero-out memory it frees up.

std::string is, roughly speaking, a wrapper around char* that hides all the allocations and deallocations behind a very nice interface, so that you don't need to care about memory management. The constructor of std::string and all operations on it allocate or reallocate the array, and the destructor deallocates it.
When you pass something into a function by value (as you do in your encodeUsername function in line username = "@" + username + "@FIN"), it creates a new object with a copy of what you are passing, which will be destroyed as soon as the function ends. So in this case, as soon as encodeUsername returns, username is destroyed, because it was passed by value, and is contained within the function's scope. Since the object is destroyed, its destructor is called, and at that point the string is deallocated. The pointer to the raw data that you retrieved by calling to c_str() now points to something that does not exist any longer.
When you allocate an object immediately following a deallocation, you are very likely to reuse the memory of the object that was just freed. In your case, as you create a new string, tim, it allocates memory at the same address that was just deallocated when encodeUsername returned.

Now, how can you fix it?

First, if you don't care about the input string (as, if you are OK with overwriting it), you can just pass it by reference:

const char* encodeUsername(string& username)

This will fix it, because username is not a copy, so it is not destroyed at the end of the function. The problem now, however, is that this function will change the value of the string you are passing in, which is very undesirable and creates an unintuitive interface.

Second, you can allocate a new char array before returning it, and then free it at the end of the calling function:

const char* encodeUsername(string username)
{
    username = "@" + username + "@FIN";
    return strdup(username.c_str());
}

and then at the end of main:

free(encodedUsername);
free(encodedUsername2);

(note that you have to use free and not delete[] because the array was allocated using strdup)

This will work because the char array we return is allocated on the heap right before we return and is not freed. It comes at a price that now the calling function need to free it up, which is, again, an unintuitive interface.

Finally, the proper solution would be to return an std::string instead of a char pointer, in which case the std::string will take care of all the allocations and deallocations for you:

string encodeUsername(string username)
{
    username = "@" + username + "@FIN";
    return username;
}

And then in the main function:

string encodedUsername2 = encodeUsername(tim);
printf("%s
", encodedUsername2.c_str());

How could creating a string change the value pointed to by a const char*?

Answers (2)

Related Questions