Reputation: 86
I'm reading about structure padding in C here: http://www.catb.org/esr/structure-packing/.
I don't understand why padding determined during compile-time for variables/structures allocated on stack is valid semantically in all cases. Let me provide an example. Say we have this toy code to be compiled:
int main() {
int a;
a = 1;
}
On X86-64 gcc -S -O0 a.c
generates this assembly (unnecessary symbols removed):
main:
pushq %rbp
movq %rsp, %rbp
movl $1, -4(%rbp)
movl $0, %eax
popq %rbp
ret
In this case why do we know that value of %rbp
and consequently %rbp-4
is 4-aligned to be suitable for storing/loading int?
Let's try the same example with structs.
struct st{
char a;
int b;
}
From the reading I infer that padded version of structure looks something like this:
struct st{
char a; // 1 byte
char pad[3]; // 3 bytes
int b; // 4 bytes
}
So, second toy example
int main() {
struct st s;
s.a = 1;
s.b = 2;
}
generates
main:
pushq %rbp
movq %rsp, %rbp
movb $1, -8(%rbp)
movl $2, -4(%rbp)
movl $0, %eax
popq %rbp
ret
And we observe that this really is the case. But again, what is the guarantee that value of rbp
itself on arbitrary stack frame is properly aligned? Isn't the value of rbp
available only in run time? How can compiler align members of struct if nothing is known about alignment of struct's start address at compile time?
Upvotes: 3
Views: 953
Reputation: 47593
As @P__J__ points out (in a now deleted answer) - how a C compiler generates code is an implementation detail. Since you tagged this as an ABI question, your real question is "When GCC is targeting Linux, how is it allowed to assume that RSP has any particular minimum alignment?". The 64-bit ABI that Linux uses is the AMD64(x86-64) System V ABI. The minimum alignment of the stack just before CALLing an ABI compliant1,2 function (including main
) is guaranteed to be a minimum of 16 bytes (it can be 32 byte or 64 bytes depending on the types passed to the function). The ABI states:
3.2.2 The Stack Frame
In addition to registers, each function has a frame on the run-time stack. This stack grows downwards from high addresses. Figure 3.3 shows the stack organization. The end of the input argument area shall be aligned on a 16 (32 or 64, if __m256 or __m512 is passed on stack) byte boundary. In other words, the value (%rsp + 8) is always a multiple of 16 (32 or 64) when control is transferred to the function entry point. The stack pointer, %rsp, always points to the end of the latest allocated stack frame.
You may ask why the mention RSP+8 being a multiple of 16 (and not RSP+0). This is because the concept of CALLing a function implies that an 8 byte return address will be placed on the stack by the CALL instruction itself. Whether a function is called or jumped to (ie: tail call), the code generator always assumes that just prior to executing the first instruction in a function the stack is always misaligned by 8. There is an automatic guarantee though that the stack will be aligned on an 8 byte boundary. If you subtract 8 from RSP you are guaranteed to be 16 byte aligned once again.
It is noteworthy to observe that the code below guarantees that after the PUSHQ
the stack is aligned on a 16 byte boundary since the PUSH
instruction decreases RSP by 8 and aligns the stack to a 16 byte boundary once again:
main:
# <------ Stack pointer (RSP) misaligned by 8 bytes
pushq %rbp
# <------ Stack pointer (RSP) aligned to 16 byte boundary
movq %rsp, %rbp
movb $1, -8(%rbp)
movl $2, -4(%rbp)
movl $0, %eax
popq %rbp
ret
For 64-bit code, the conclusion one can draw from all this is that although the actual value of the stack pointer is known at run-time, the ABI allows us to infer that the value upon entry to a function has a particular alignment and the compilers code generation system can use that to its advantage when placing a struct
on the stack.
A logical question is - if the stack alignment that can be guaranteed upon entry to a function is not enough for the alignment of a struct or data type placed on the stack, what does the GCC compiler do? Consider this revision to your program:
struct st{
char a; // 1 byte
char pad[3]; // 3 bytes
int b; // 4 bytes
};
int main() {
struct st s __attribute__(( aligned(32)));
s.a = 1;
s.b = 2;
}
We've told GCC that the variable s
should be 32 byte aligned. A function that can guarantee 16 byte stack alignment doesn't guarantee 32 byte alignment (32 byte alignment does guarantee 16 byte alignment since 32 is evenly divisible by 16). The GCC compiler will have to generate function prologue so that s
can be properly aligned. You can look at the unoptimized output of godbolt for this program to see how GCC achieves this:
main:
pushq %rbp
movq %rsp, %rbp
andq $-32, %rsp # ANDing RSP with -32 (0xFFFFFFFFFFFFFFE0)
# rounds RSP down to next 32 byte boundary
# by zeroing the lower 5 bits of RSP.
movb $1, -32(%rsp)
movl $2, -28(%rsp)
movl $0, %eax
leave
ret
Upvotes: 7
Reputation: 58608
In this case why do we know that value of %rbp and consequently %rbp-4 is 4-aligned to be suitable for storing/loading int?
In this particular case, we know that we are on an x86 processor on which any address is suitable for loading and storing an integer. The caller could decrement or offset a previously aligned %rbp
by 17 and it wouldn't make a difference, other than possibly to performance.
Yet, it is aligned. Why we know that is that it's an invariant of the system that we trust, required by the ABI. If the stack pointer is not aligned, it means the caller violated an aspect of the calling conventions.
Unless we are a receiving a call from a separate security domain (like a kernel receiving a system call from user space) we simply trust the caller. How does the strcmp
function know that its arguments point to valid, null-terminated strings? It trusts the caller. Same thing.
If a function receives an aligned %rsp
and ensures that all manipulations of it preserve alignment, then whatever functions it calls receive an aligned %rsp
also. Ensuring that all calls are made with the required stack alignment is ensured by the compiler. If you're writing assembly code, you have to ensure that yourself.
How can compiler align members of struct if nothing is known about alignment of struct's start address at compile time?
The members of a struct
are given offsets under the assumption that the run-time base address of the object will be suitably aligned for even the most strictly aligned struct member. This is why the first member of a struct is simply placed at offset zero, regardless of its type.
The run-time has to ensure that any address allocated for an arbitrary object has the strictest alignment of any standard type, alignof(maxalign_t)
. For instance, if the strictest alignment on a system is 16 bytes (like in x86-64 System V), then malloc
has to hand out pointers to 16-byte-aligned addresses. Then any kind of struct can be placed into the resulting memory.
If you write your own supposedly general-purpose allocator that hands out 4-byte-aligned pointers on a system where alignment may be as strict as 16, then it's wrong.
(Note that __m256
and __m512
types don't count for maxalign_t
: malloc
still only has to ensure 16-byte alignment in x86-64 System V, and isn't sufficient for over-aligned types like __m256
or a custom struct foo { alignas(32) int32_t a[8]; };
. Use aligned_alloc()
for over-aligned types.)
Also note that the wording in the ISO C standard is that memory returned by malloc
has to be usable for any type. A 4-byte allocation can't hold a 16-byte type anyway, so small allocations are allowed to be less than 16-byte aligned.
Upvotes: 4