Guy Avraham
Guy Avraham

Reputation: 3690

Confusion regarding stack growth on x86_64 Linux

I'm trying to fully understand the stack growth mechanism on function calls and I feel a bit confused. In order to better understand I wrote the following simple program:

#include <stdio.h>
#include <stdint.h>

void callee(uint32_t* p)
{
    uint32_t tmp = 9;
    printf("callee - tmp is located at address location:%p and p is:%p \n", &tmp, p);
}

void caller()
{
    uint32_t tmp1 = 12;
    printf("caller - address of tmp1:%p \n", &tmp1);
    calle(&tmp1);
}

int main(int argc, char** argv)
{
    caller();
    return 0;
}

And using a online assembler converter I got the following assembly output (I left only the code of the callee function):

.LC0:
    .string "callee - tmp is located at address location:%p and p is:%p \n"
calle:
    push    rbp
    mov     rbp, rsp
    sub     rsp, 32 // command 1
    mov     QWORD PTR [rbp-24], rdi
    mov     DWORD PTR [rbp-4], 9 // command 2
    mov     rdx, QWORD PTR [rbp-24]
    lea     rax, [rbp-4]
    mov     rsi, rax
    mov     edi, OFFSET FLAT:.LC0
    mov     eax, 0
    call    printf
    nop
    leave
    ret

As I understand, taking into account commands 1 & 2 (noted above), the stack indeed grows down towards lower addresses, and a (sample) output of the compiled code, when I compile it using the command gcc myProg.c -o prog, is as follows:

caller - address of tmp1:0x7ffe423e8ed4

callee - tmp is located at address location:0x7ffe423e8eb4 and p is:0x7ffe423e8ed4

Where it can be seen, that indeed, the local variable allocated within the callee function is located in a lower memory address than the local variable within the caller function.So far so good.

Yet, when I compile the program with the -O2 option (i.e.: gcc -O2 myProg.c -o prog) , a (sample) output of the compiled code is something as follows:

caller - address of tmp1:0x7fff0d5bfa90

callee - tmp is located at address location:0x7fff0d5bfa94 and p is:0x7fff0d5bfa90

Which, this time, depicts that the local variable allocated within the callee stack frame is located in a higher memory address than the local variable within the caller function.

So my question is - the -O2 optimization option optimizes "up to" a situation where the stack growth mechanism actually changes or am I missing here something... ?

gcc version: 7.3

architecture: x86_64

OS: Ubuntu 18.04.

Appreciate your clarifications.

Guy.

Upvotes: 3

Views: 228

Answers (2)

Peter Cordes
Peter Cordes

Reputation: 364068

-O2 inlines functions, at which point the compiler is free to do stack allocation however it wants.

Address comparison between separate objects (like tmp and tmp1) is technically undefined behaviour in C, so any kind of > or < relationship between addresses based on function nesting is not an observable side-effect that optimization needs to preserve when following the as-if rule. Compilers don't even try to do so when inlining functions.

ISO C11 draft n1548, §6.5.8 Relational operators

5) When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object types both point to the same object, or both point one past the last element of the same array object, they compare equal. If the objects pointed to are members of the same aggregate object, pointers to structure members declared later compare greater than pointers to members declared earlier in the structure, and pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P. In all other cases, the behavior is undefined

Converting addresses to integers like uintptr_t, or printing them out and comparing in your head, is not UB, but the results are still not guaranteed based on anything.

Upvotes: 7

KamilCuk
KamilCuk

Reputation: 140960

Because the printf call from calle got optimized into caller function, see godbolt.

Assembly output for gcc 7.3 -O2 :

.LC0:
        .string "calle - tmp is located at address location:%p and p is:%p \n"
calle:
        sub     rsp, 24
        mov     rdx, rdi
        xor     eax, eax
        lea     rsi, [rsp+12]
        mov     edi, OFFSET FLAT:.LC0
        mov     DWORD PTR [rsp+12], 9
        call    printf
        add     rsp, 24
        ret
.LC1:
        .string "caller - address of tmp1:%p \n"
caller:
        sub     rsp, 24
        mov     edi, OFFSET FLAT:.LC1
        xor     eax, eax
        lea     rsi, [rsp+8]
        mov     DWORD PTR [rsp+8], 12
        call    printf
        lea     rdx, [rsp+8]
        lea     rsi, [rsp+12]
        mov     edi, OFFSET FLAT:.LC0
        xor     eax, eax
        mov     DWORD PTR [rsp+12], 9
        call    printf
        add     rsp, 24
        ret
main:
        sub     rsp, 8
        xor     eax, eax
        call    caller
        xor     eax, eax
        add     rsp, 8
        ret

As you can see the calle function was inlined into caller, thus caller function calls printf two times, first with LC1 string, then with LC0 string. First time it prints the address of rsp+8 which is tmp1, the second time with rsp+12 which is tmp2. The gcc is free to chose the order of variables it chooses.

You can put __attribute__((__noinline__)) attribute to calle to "fix" this, but... you shouldn't expect variables addresses to have any order at all (except when you can, like arrays and structures).

P.S. Calling "%p" printf modifier not with a void* pointer is technically undefined behaviour, so you should cast printf arg to void* before printing. printf("caller - address of tmp1:%p \n", (void*)&tmp1);

Upvotes: 4

Related Questions