Reputation: 11439
So it seems like everyone knows that OSX syscalls are always 16 byte stack aligned. Great, that makes sense when you have code like this:
section .data
message db 'something', 10, 0
section .text
global start
start:
push 10 ; size of the message (4 bytes)
push msg ; the address of the message (4 bytes)
push 1 ; we want to write to STD_OUT (4 bytes)
mov eax, 4 ; write(...) syscall
sub esp, 4 ; move stack pointer down to 4 bytes for a total of 16.
int 0x80 ; invoke
add esp, 16 ; clean
Perfect, the stack is aligned to 16 bytes, makes perfect sense. How about though we call syscall(1) (exit
). Logically that would look something like this:
push 69 ; return value
mov eax, 1 ; exit(...) syscall
sub esp, 12 ; push down stack for total of 16 bytes.
int 0x80 ; invoke
This doesn't work though, but this does:
push 69 ; return value
mov eax, 1 ; exit(...) syscall
sub esp, 4 ; push down stack for total of 8 bytes.
int 0x80 ; invoke
That works fine, but that's only 8 bytes???? Osx is cool, but this ABI is driving me nuts. Can someone shed some light on what I'm not understanding?
Upvotes: 4
Views: 2037
Reputation: 90701
Short version: you probably don't need to align to 16 bytes, you just need to always leave a 4-byte gap before your argument list.
Long version:
Here's what I think is happening: I'm not sure that it's true that the stack should be 16-byte aligned. However, logic dictates that if it is and if padding or adjusting the stack is necessary to achieve that alignment, it must happen before the arguments for the syscall are pushed, not after. There can't be an arbitrary number of bytes between the stack pointer at the time of the int 0x80
instruction and where the arguments actually are. The kernel wouldn't know where to find the actual arguments. Subtracting from the stack pointer after pushing the arguments to achieve "alignment" doesn't align the arguments, it aligns the stack pointer by inserting an arbitrary number of bytes between the stack pointer and the arguments. Whatever else may be true, that can't be right.
Then why do the first and third snippets work at all? Don't they also insert arbitrary bytes there? They work by accident. It's because they both happen to insert 4 bytes. That adjustment isn't "successful" because it achieves stack alignment, it's part of the syscall ABI. Apparently, the syscall ABI expects and requires that there be a 4-byte slot before the argument list.
The source for the syscall()
function can be found here. It looks like this:
LEAF(___syscall, 0)
popl %ecx // ret addr
popl %eax // syscall number
pushl %ecx
UNIX_SYSCALL_TRAP
movl (%esp),%edx // add one element to stack so
pushl %ecx // caller "pop" will work
jnb 2f
BRANCH_EXTERN(cerror)
2:
END(___syscall)
To call this library function, the caller will have set up the stack pointer to point to the arguments to the syscall()
function, which starts with the syscall number and then has the real arguments for the actual syscall. However, the caller will then have used a call
instruction to call it, which pushed the return address onto the stack.
So, the above code pops the return address, pops the syscall number into %eax
, pushes the return address back onto the stack (where the syscall number originally was), and then does int 0x80
. So, the stack pointer points to the return address and then the arguments. There's the extra 4 bytes: the return address. I suspect the kernel ignores the return address. I guess its presence in the syscall ABI may just be to make the ABI for system calls similar to that of function calls.
What does this mean for the alignment requirement of syscalls? Well, this function is guaranteed to change the alignment of the stack from how it was set up by its caller. The caller presumably set up the stack with 16-byte alignment and this function moves it by 4 bytes before the interrupt. It may just be a myth that the stack needs to be 16-byte aligned for syscalls. On the other hand, the 16-byte alignment requirement is definitely real for calling system library functions. The Wine project, for which I develop, was burned by it. It is mostly necessary for 128-bit SSE argument data types, but Apple made their lazy symbol resolver deliberately blow up if the alignemtn is wrong even for functions which don't use such arguments so that problems would be found early. Syscalls would not be subject to that early-failure mechanism. It may be that the kernel doesn't require the 16-byte alignment. I'm not sure if any syscalls take 128-bit arguments.
Upvotes: 4