Caiyi Zhou
Caiyi Zhou

Reputation: 141

Why can we return char* from function?

Here is a piece of C++ code that shows some very peculiar behavior. Who can tell me why strB can print out the stuff?

char* strA()
{
    char str[] = "hello word";
    return str;
}

char* strB()
{
    char* str = "hello word";
    return str;
}

int main()
{ 
    cout<<strA()<<endl;  
    cout<<strB()<<endl;
}
                      
                                                                                                                                                            

Upvotes: 2

Views: 668

Answers (3)

Dmytro
Dmytro

Reputation: 5213

case 1:

#include <stdio.h>

char *strA() {
    char str[] = "hello world";
    return str;
}

int main(int argc, char **argv) {
    puts(strA());
    return 0;
}

The statement char str[] = "hello world"; is (probably) put on the stack when called, and expires once the function exits. If you are naïve enough to assume this is how it works on all target systems, you can write cute code like this, since the continuation is called ON TOP of the existing stack(so the data of the function still exists since it hasn't returned yet):

You can kinda cheat this with a continuation:

#include <stdio.h>

void strA(void (*continuation)(char *)) {
    char str[] = "hello world";
    continuation(str);
}

void myContinuation(char *arg) {
    puts(arg);
}

int main(int argc, char **argv) {
    strA(myContinuation);
    return 0;
}

case 2: If you use the snippet below, the literal "hello world" is usually stored in a protected read-only memory(trying to modify this string will cause a segmentation fault on many systems, this is similar to how your main, and strA are stored, c code is basically just a string of instructions/memory blob in the same way a string is a string of characters, but I digress), This string will be available to the program even if the function was never called if you just know the address it's suppose to be on the specific system. In the snippet below, the program prints the string without even calling the function, this will often work on the same platform, with a relatively same code and same compiler. It is considered undefined behavior though.

#include <stdio.h>

char *strB() {
    char *str = "hello world";
    return str;
}

int main(int argc, char **argv) {
    char *myStr;

    // comment the line below and replace it with
    // result of &myStr[0], in my case, result of &myStr[0] is 4231168
    printf("is your string: %s.\n", (char *)4231168);
    myStr = strB();
    printf("str is at: %lld\n", &myStr[0]);
    return 0;
}

You can opt for a strC using structs and relative safety. This structure is created on the stack and FULLY returned. The return of strC is 81(an arbitrary number I made up for the structure, that I trust myself to respect) bytes in size.

#include <stdio.h>

typedef struct {
     char data[81];
} MY_STRING;

MY_STRING strC() {
    MY_STRING str = {"what year is this?"};
    return str;
}
    
int main(int argc, char **argv) {
    puts(strC().data);
    printf("size of strC's return: %d.\n", sizeof(strC()));

    return 0;
}
  

tldr; strB is likely corrupted by printf as soon as it returns from the function(since printf now has its' own stack), whereas string used in strA exists outside the function, it's basically a pointer to a global constant available as soon as program starts(the string is there in memory no different to how the code is in memory).

Upvotes: 0

songyuanyao
songyuanyao

Reputation: 172894

String literals exist for the life of the program.

String literals have static storage duration, and thus exist in memory for the life of the program.

That means cout<<strB()<<endl; is fine, the returned pointer pointing to string literal "hello word" remains valid.

On the other hand, cout<<strA()<<endl; leads to UB. The returned pointer is pointing to the 1st element of the local array str; which is destroyed when strA() returns, left the returned pointer dangled.


BTW: String literals are of type const char[], char* str = "hello word"; is invalid since C++11 again. Change it to const char* str = "hello word";, and change the return type of strB() to const char* too.

String literals are not convertible or assignable to non-const CharT*. An explicit cast (e.g. const_cast) must be used if such conversion is wanted. (since C++11)

Upvotes: 4

ph3rin
ph3rin

Reputation: 4848

Why does strB() work?

A string literal (e.g. "a string literal") has static storage duration. That means its lifetime spans the duration of your program's execution. This can be done because the compiler knows every string literal that you are going to use in your program, hence it can store their data directly into the data section of the compiled executable (example: https://godbolt.org/z/7nErYe)

When you obtain a pointer to it, this pointer can be passed around freely (including being returned from a function) and dereferenced as the object it points to is always alive.

Why doesn't strA() work?

However, initializing an array of char from a string literal copies the content of the string literal. The created array is a different object from the original string literal. If such array is a local variable (i.e. has automatic storage duration), as in your strA(), then it is destroyed after the function returns.

When you return from strA(), since the return type is char* an "array-to-pointer-conversion" is performed, creating a pointer to the first element of the array. However, since the array is destroyed when the function returns, the pointer returned becomes invalid. You should not try to dereference such pointers (and avoid creating them in the first place).

Upvotes: 8

Related Questions