Reputation: 6440
Consider the following code snippet.
#include <stdio.h>
typedef struct s {
int _;
char str[];
} s;
s first = { 0, "abcd" };
int main(int argc, const char **argv) {
s second = first;
printf("%s\n%s\n", first.str, second.str);
}
When I compile this with GCC 7.2, I get:
$ gcc-7 -o tmp tmp.c && ./tmp
abcd
abcd
But when I compile this with Clang (Apple LLVM version 8.0.0 (clang-800.0.42.1)), I get the following:
$ clang -o tmp tmp.c && ./tmp
abcd
# Nothing here
Why does the output differ between the compilers? I would expect the string not to be copied, as it's a flexible array member (similar to this question). Why does GCC actually copy it?
Edit
Some comments and an answer suggested this might be due to optimization. GCC may make second
an alias of first
, so updating second
should disallow GCC from doing that optimization. I added the line:
second._ = 1;
But this doesn't change the output.
Upvotes: 5
Views: 896
Reputation: 10048
So, for an answer, both compilers are behaving correctly, but the answers you are getting are undefined behavior.
GCC
Because you never modify second
GCC is simply making second
and alias of first
in its lookup table. Modify second and GCC cannot make that optimization and you’ll get the same answer/crash as Clang.
Clang
Clang does not automatically apply the same optimization, it seems. So when it copies the structure, it does so correctly: It copies the single int
and nothing else.
You were lucky that there was a zero value on the stack after your local second
variable, terminating your unknown character string. Basically, you are using an uninitialized pointer. Were there no zero, you could have gotten a lot of garbage and a memory fault.
The purpose of this thing is to do low-level stuff, like implement a memory manager, etc, by casting some memory to your structure. The compiler is under no obligation to understand what you are doing; it is only under obligation to act as if you know what you are doing. If you fail to cast the structure type over memory that actually has data of that type in it, all bets are off.
edit
So, using godbolt.org and looking at the assembly:
.LC0:
.string "%s\n%s\n"
main:
sub rsp, 24
mov eax, DWORD PTR first[rip]
mov esi, OFFSET FLAT:first+4
lea rdx, [rsp+16]
mov edi, OFFSET FLAT:.LC0
mov DWORD PTR [rsp+12], eax
xor eax, eax
call printf
xor eax, eax
add rsp, 24
ret
first:
.long 0
.string "abcd"
We see that GCC is, actually, doing exactly what I said with the OP’s original code: treating second
as an alias of first
.
Tom Karzes has significantly modified the code, and so is experiencing a different issue. What he reports does appear to be a bug; I haven’t time ATM to figure out what is really happening with his stack-corrupting assignment.
Upvotes: 0
Reputation: 24052
Here's the real answer of what's going on with gcc. second
is allocated on the stack, just as you'd expect. It is not an alias for first
. This is easily verified by printing their addresses.
Additionally, the declaration s second = first;
is corrupting the stack, because (a) gcc is allocating the minimum amount of storage for second
but (b) it is copying all of first
into second, corrupting the stack.
Here is a modified version of the original code which shows this:
#include <stdio.h>
typedef struct s {
int _;
char str[];
} s;
s first = { 0, "abcdefgh" };
int main(int argc, const char **argv) {
char v[] = "xxxxxxxx";
s second = first;
printf("%p %p %p\n", (void *) v, (void *) &first, (void *) &second);
printf("<%s> <%s> <%s>\n", v, first.str, second.str);
}
On my 32-bit Linux machine, with gcc, I get the following output:
0xbf89a303 0x804a020 0xbf89a2fc
<defgh> <abcdefgh> <abcdefgh>
As you can see from the addresses, v
and second
are on the stack, and first
is in the data section. Further, it is also clear that the initialization of second
has overwritten v
on the stack, with the result that instead of the expected <xxxxxxxx>
, it is instead showing <defgh>
.
This seems like a gcc bug to me. At the very least, it should warn that the initialization of second
will corrupt the stack, since it clearly has enough information to know this at compile time.
Edit: I tested this some more, and obtained essentially equivalent results by splitting the declaration of second
into:
s second;
second = first;
The real problem is the assignment. It's copying all of first
, rather than the minimal common part of the structure type, which is what I believe it should do. In fact, if you move the static initialization of first
into a separate file, the assignment does what it should do, v
prints correctly, and second.str
is undefined garbage. This is the behavior gcc should be producing, regardless of whether the initialization of first
is visible in the same compilation unit or not.
Upvotes: 4