dav23r
dav23r

Reputation: 86

Why padding in C is valid for variables/structs allocated on stack?

I'm reading about structure padding in C here: http://www.catb.org/esr/structure-packing/.
I don't understand why padding determined during compile-time for variables/structures allocated on stack is valid semantically in all cases. Let me provide an example. Say we have this toy code to be compiled:

int main() {
    int a;
    a = 1;
}

On X86-64 gcc -S -O0 a.c generates this assembly (unnecessary symbols removed):

main:
    pushq   %rbp
    movq    %rsp, %rbp
    movl    $1, -4(%rbp)
    movl    $0, %eax
    popq    %rbp
    ret

In this case why do we know that value of %rbp and consequently %rbp-4 is 4-aligned to be suitable for storing/loading int?

Let's try the same example with structs.

struct st{
    char a;
    int b;
}

From the reading I infer that padded version of structure looks something like this:

struct st{
    char a;      // 1 byte
    char pad[3]; // 3 bytes
    int b;       // 4 bytes
}

So, second toy example

int main() {
    struct st s;
    s.a = 1;
    s.b = 2;
}

generates

main:
    pushq   %rbp
    movq    %rsp, %rbp
    movb    $1, -8(%rbp)
    movl    $2, -4(%rbp)
    movl    $0, %eax
    popq    %rbp
    ret

And we observe that this really is the case. But again, what is the guarantee that value of rbp itself on arbitrary stack frame is properly aligned? Isn't the value of rbp available only in run time? How can compiler align members of struct if nothing is known about alignment of struct's start address at compile time?

Upvotes: 3

Views: 953

Answers (2)

Michael Petch
Michael Petch

Reputation: 47593

As @P__J__ points out (in a now deleted answer) - how a C compiler generates code is an implementation detail. Since you tagged this as an ABI question, your real question is "When GCC is targeting Linux, how is it allowed to assume that RSP has any particular minimum alignment?". The 64-bit ABI that Linux uses is the AMD64(x86-64) System V ABI. The minimum alignment of the stack just before CALLing an ABI compliant1,2 function (including main) is guaranteed to be a minimum of 16 bytes (it can be 32 byte or 64 bytes depending on the types passed to the function). The ABI states:

3.2.2 The Stack Frame

In addition to registers, each function has a frame on the run-time stack. This stack grows downwards from high addresses. Figure 3.3 shows the stack organization. The end of the input argument area shall be aligned on a 16 (32 or 64, if __m256 or __m512 is passed on stack) byte boundary. In other words, the value (%rsp + 8) is always a multiple of 16 (32 or 64) when control is transferred to the function entry point. The stack pointer, %rsp, always points to the end of the latest allocated stack frame.

You may ask why the mention RSP+8 being a multiple of 16 (and not RSP+0). This is because the concept of CALLing a function implies that an 8 byte return address will be placed on the stack by the CALL instruction itself. Whether a function is called or jumped to (ie: tail call), the code generator always assumes that just prior to executing the first instruction in a function the stack is always misaligned by 8. There is an automatic guarantee though that the stack will be aligned on an 8 byte boundary. If you subtract 8 from RSP you are guaranteed to be 16 byte aligned once again.

It is noteworthy to observe that the code below guarantees that after the PUSHQ the stack is aligned on a 16 byte boundary since the PUSH instruction decreases RSP by 8 and aligns the stack to a 16 byte boundary once again:

main:
                             # <------ Stack pointer (RSP) misaligned by 8 bytes
    pushq   %rbp
                             # <------ Stack pointer (RSP) aligned to 16 byte boundary
    movq    %rsp, %rbp
    movb    $1, -8(%rbp)
    movl    $2, -4(%rbp)
    movl    $0, %eax
    popq    %rbp
    ret

For 64-bit code, the conclusion one can draw from all this is that although the actual value of the stack pointer is known at run-time, the ABI allows us to infer that the value upon entry to a function has a particular alignment and the compilers code generation system can use that to its advantage when placing a struct on the stack.


When a function's stack alignment isn't enough for a variable's alignment?

A logical question is - if the stack alignment that can be guaranteed upon entry to a function is not enough for the alignment of a struct or data type placed on the stack, what does the GCC compiler do? Consider this revision to your program:

struct st{
    char a;      // 1 byte
    char pad[3]; // 3 bytes
    int b;       // 4 bytes
};

int main() {
    struct st s __attribute__(( aligned(32)));
    s.a = 1;
    s.b = 2;
}

We've told GCC that the variable s should be 32 byte aligned. A function that can guarantee 16 byte stack alignment doesn't guarantee 32 byte alignment (32 byte alignment does guarantee 16 byte alignment since 32 is evenly divisible by 16). The GCC compiler will have to generate function prologue so that s can be properly aligned. You can look at the unoptimized output of godbolt for this program to see how GCC achieves this:

main:
        pushq   %rbp
        movq    %rsp, %rbp
        andq    $-32, %rsp    # ANDing RSP with -32 (0xFFFFFFFFFFFFFFE0) 
                              # rounds RSP down to next 32 byte boundary
                              # by zeroing the lower 5 bits of RSP.
        movb    $1, -32(%rsp) 
        movl    $2, -28(%rsp)
        movl    $0, %eax
        leave
        ret

Footnotes

  • 1The AMD64 System V ABI is also used by 64-bit Solaris, MacOS, and BSD as well as Linux
  • 2The 64-bit Microsoft Windows calling convention (ABI) guarantees that prior to a function call that the stack is 16-byte aligned (8 byte misaligned just prior to the first instruction of the function being executed).

Upvotes: 7

Kaz
Kaz

Reputation: 58608

In this case why do we know that value of %rbp and consequently %rbp-4 is 4-aligned to be suitable for storing/loading int?

In this particular case, we know that we are on an x86 processor on which any address is suitable for loading and storing an integer. The caller could decrement or offset a previously aligned %rbp by 17 and it wouldn't make a difference, other than possibly to performance.

Yet, it is aligned. Why we know that is that it's an invariant of the system that we trust, required by the ABI. If the stack pointer is not aligned, it means the caller violated an aspect of the calling conventions.

Unless we are a receiving a call from a separate security domain (like a kernel receiving a system call from user space) we simply trust the caller. How does the strcmp function know that its arguments point to valid, null-terminated strings? It trusts the caller. Same thing.

If a function receives an aligned %rsp and ensures that all manipulations of it preserve alignment, then whatever functions it calls receive an aligned %rsp also. Ensuring that all calls are made with the required stack alignment is ensured by the compiler. If you're writing assembly code, you have to ensure that yourself.

How can compiler align members of struct if nothing is known about alignment of struct's start address at compile time?

The members of a struct are given offsets under the assumption that the run-time base address of the object will be suitably aligned for even the most strictly aligned struct member. This is why the first member of a struct is simply placed at offset zero, regardless of its type.

The run-time has to ensure that any address allocated for an arbitrary object has the strictest alignment of any standard type, alignof(maxalign_t). For instance, if the strictest alignment on a system is 16 bytes (like in x86-64 System V), then malloc has to hand out pointers to 16-byte-aligned addresses. Then any kind of struct can be placed into the resulting memory.

If you write your own supposedly general-purpose allocator that hands out 4-byte-aligned pointers on a system where alignment may be as strict as 16, then it's wrong.


(Note that __m256 and __m512 types don't count for maxalign_t: malloc still only has to ensure 16-byte alignment in x86-64 System V, and isn't sufficient for over-aligned types like __m256 or a custom struct foo { alignas(32) int32_t a[8]; };. Use aligned_alloc() for over-aligned types.)

Also note that the wording in the ISO C standard is that memory returned by malloc has to be usable for any type. A 4-byte allocation can't hold a 16-byte type anyway, so small allocations are allowed to be less than 16-byte aligned.

Upvotes: 4

Related Questions