Understanding OSX 16-Byte alignment

Question

So it seems like everyone knows that OSX syscalls are always 16 byte stack aligned. Great, that makes sense when you have code like this:

section .data
  message db 'something', 10, 0

section .text
  global start

start:
push    10         ; size of the message (4 bytes)
push    msg        ; the address of the message (4 bytes)
push    1          ; we want to write to STD_OUT (4 bytes)
mov     eax, 4     ; write(...) syscall
sub     esp, 4     ; move stack pointer down to 4 bytes for a total of 16.
int     0x80       ; invoke
add     esp, 16    ; clean

Perfect, the stack is aligned to 16 bytes, makes perfect sense. How about though we call syscall(1) (exit). Logically that would look something like this:

push    69         ; return value
mov     eax, 1    ; exit(...) syscall
sub     esp, 12   ; push down stack for total of 16 bytes.
int     0x80      ; invoke

This doesn't work though, but this does:

push    69         ; return value
mov     eax, 1    ; exit(...) syscall
sub     esp, 4    ; push down stack for total of 8 bytes.
int     0x80      ; invoke

That works fine, but that's only 8 bytes???? Osx is cool, but this ABI is driving me nuts. Can someone shed some light on what I'm not understanding?

Ken Thomases · Accepted Answer

Short version: you probably don't need to align to 16 bytes, you just need to always leave a 4-byte gap before your argument list.

Long version:

Here's what I think is happening: I'm not sure that it's true that the stack should be 16-byte aligned. However, logic dictates that if it is and if padding or adjusting the stack is necessary to achieve that alignment, it must happen before the arguments for the syscall are pushed, not after. There can't be an arbitrary number of bytes between the stack pointer at the time of the int 0x80 instruction and where the arguments actually are. The kernel wouldn't know where to find the actual arguments. Subtracting from the stack pointer after pushing the arguments to achieve "alignment" doesn't align the arguments, it aligns the stack pointer by inserting an arbitrary number of bytes between the stack pointer and the arguments. Whatever else may be true, that can't be right.

Then why do the first and third snippets work at all? Don't they also insert arbitrary bytes there? They work by accident. It's because they both happen to insert 4 bytes. That adjustment isn't "successful" because it achieves stack alignment, it's part of the syscall ABI. Apparently, the syscall ABI expects and requires that there be a 4-byte slot before the argument list.

The source for the syscall() function can be found here. It looks like this:

LEAF(___syscall, 0)
    popl    %ecx        // ret addr
    popl    %eax        // syscall number
    pushl   %ecx
    UNIX_SYSCALL_TRAP
    movl    (%esp),%edx // add one element to stack so
    pushl   %ecx        // caller "pop" will work
    jnb 2f
    BRANCH_EXTERN(cerror)
2:
END(___syscall)

To call this library function, the caller will have set up the stack pointer to point to the arguments to the syscall() function, which starts with the syscall number and then has the real arguments for the actual syscall. However, the caller will then have used a call instruction to call it, which pushed the return address onto the stack.

So, the above code pops the return address, pops the syscall number into %eax, pushes the return address back onto the stack (where the syscall number originally was), and then does int 0x80. So, the stack pointer points to the return address and then the arguments. There's the extra 4 bytes: the return address. I suspect the kernel ignores the return address. I guess its presence in the syscall ABI may just be to make the ABI for system calls similar to that of function calls.

What does this mean for the alignment requirement of syscalls? Well, this function is guaranteed to change the alignment of the stack from how it was set up by its caller. The caller presumably set up the stack with 16-byte alignment and this function moves it by 4 bytes before the interrupt. It may just be a myth that the stack needs to be 16-byte aligned for syscalls. On the other hand, the 16-byte alignment requirement is definitely real for calling system library functions. The Wine project, for which I develop, was burned by it. It is mostly necessary for 128-bit SSE argument data types, but Apple made their lazy symbol resolver deliberately blow up if the alignemtn is wrong even for functions which don't use such arguments so that problems would be found early. Syscalls would not be subject to that early-failure mechanism. It may be that the kernel doesn't require the 16-byte alignment. I'm not sure if any syscalls take 128-bit arguments.

Understanding OSX 16-Byte alignment

Answers (1)

Related Questions