Reputation: 3690
I'm trying to fully understand the stack growth mechanism on function calls and I feel a bit confused. In order to better understand I wrote the following simple program:
#include <stdio.h>
#include <stdint.h>
void callee(uint32_t* p)
{
uint32_t tmp = 9;
printf("callee - tmp is located at address location:%p and p is:%p \n", &tmp, p);
}
void caller()
{
uint32_t tmp1 = 12;
printf("caller - address of tmp1:%p \n", &tmp1);
calle(&tmp1);
}
int main(int argc, char** argv)
{
caller();
return 0;
}
And using a online assembler converter I got the following assembly output (I left only the code of the callee
function):
.LC0:
.string "callee - tmp is located at address location:%p and p is:%p \n"
calle:
push rbp
mov rbp, rsp
sub rsp, 32 // command 1
mov QWORD PTR [rbp-24], rdi
mov DWORD PTR [rbp-4], 9 // command 2
mov rdx, QWORD PTR [rbp-24]
lea rax, [rbp-4]
mov rsi, rax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
nop
leave
ret
As I understand, taking into account commands 1 & 2 (noted above), the stack indeed grows down towards lower addresses, and a (sample) output of the compiled code, when I compile it using the command gcc myProg.c -o prog
, is as follows:
caller - address of tmp1:0x7ffe423e8ed4
callee - tmp is located at address location:0x7ffe423e8eb4 and p is:0x7ffe423e8ed4
Where it can be seen, that indeed, the local variable allocated within the callee
function is located in a lower memory address than the local variable within the caller
function.So far so good.
Yet, when I compile the program with the -O2
option (i.e.: gcc -O2 myProg.c -o prog
) , a (sample) output of the compiled code is something as follows:
caller - address of tmp1:0x7fff0d5bfa90
callee - tmp is located at address location:0x7fff0d5bfa94 and p is:0x7fff0d5bfa90
Which, this time, depicts that the local variable allocated within the callee
stack frame is located in a higher memory address than the local variable within the caller
function.
So my question is - the -O2
optimization option optimizes "up to" a situation where the stack growth mechanism actually changes or am I missing here something... ?
gcc
version: 7.3
architecture: x86_64
OS: Ubuntu 18.04.
Appreciate your clarifications.
Guy.
Upvotes: 3
Views: 228
Reputation: 364068
-O2
inlines functions, at which point the compiler is free to do stack allocation however it wants.
Address comparison between separate objects (like tmp
and tmp1
) is technically undefined behaviour in C, so any kind of >
or <
relationship between addresses based on function nesting is not an observable side-effect that optimization needs to preserve when following the as-if rule. Compilers don't even try to do so when inlining functions.
ISO C11 draft n1548, §6.5.8 Relational operators
5) When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object types both point to the same object, or both point one past the last element of the same array object, they compare equal. If the objects pointed to are members of the same aggregate object, pointers to structure members declared later compare greater than pointers to members declared earlier in the structure, and pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P. In all other cases, the behavior is undefined
Converting addresses to integers like uintptr_t
, or printing them out and comparing in your head, is not UB, but the results are still not guaranteed based on anything.
Upvotes: 7
Reputation: 140960
Because the printf
call from calle
got optimized into caller
function, see godbolt.
Assembly output for gcc 7.3 -O2
:
.LC0:
.string "calle - tmp is located at address location:%p and p is:%p \n"
calle:
sub rsp, 24
mov rdx, rdi
xor eax, eax
lea rsi, [rsp+12]
mov edi, OFFSET FLAT:.LC0
mov DWORD PTR [rsp+12], 9
call printf
add rsp, 24
ret
.LC1:
.string "caller - address of tmp1:%p \n"
caller:
sub rsp, 24
mov edi, OFFSET FLAT:.LC1
xor eax, eax
lea rsi, [rsp+8]
mov DWORD PTR [rsp+8], 12
call printf
lea rdx, [rsp+8]
lea rsi, [rsp+12]
mov edi, OFFSET FLAT:.LC0
xor eax, eax
mov DWORD PTR [rsp+12], 9
call printf
add rsp, 24
ret
main:
sub rsp, 8
xor eax, eax
call caller
xor eax, eax
add rsp, 8
ret
As you can see the calle
function was inlined into caller
, thus caller
function calls printf
two times, first with LC1 string, then with LC0 string. First time it prints the address of rsp+8
which is tmp1
, the second time with rsp+12
which is tmp2
. The gcc is free to chose the order of variables it chooses.
You can put __attribute__((__noinline__))
attribute to calle
to "fix" this, but... you shouldn't expect variables addresses to have any order at all (except when you can, like arrays and structures).
P.S. Calling "%p"
printf modifier not with a void*
pointer is technically undefined behaviour, so you should cast printf arg to void*
before printing. printf("caller - address of tmp1:%p \n", (void*)&tmp1);
Upvotes: 4