Reputation: 280
I'm currently playing around, examining stack frames, trying to get an idea of how this works. After reading a few articles that always explained that the general structure would be:
local vars <--- SP
low address
old BP
<--- BP
ret addr args high address
I have an example program that calls a function with three arguments and has two buffers as local variables:
#include <stdio.h>
void function(int a, int b, int c);
int main()
{
function(1, 2, 3);
return 0;
}
void function(int a, int b, int c)
{
char buffer1[5];
char buffer2[10];
}
I took a look at the assembler code of the program and was surprised not to find what I expect when the function is called. I expected something along the lines of:
# The arguments are pushed onto the stack:
push 3
push 2
push 1
call function # Pushes ret address onto stack and changes IP to function
...
# In function:
# Push old base pointer onto stack and set current base pointer to point to it
push rbp
mov rbp, rsp
# Reserve space for stack frame etc....
So that the structure of the frame, when executing the function, would be something like:
buffers <--- SP low address
old BP <--- BP
ret Addr
1
2
3 high address
But instead what happens is the following:
The function call:
mov edx, 3
mov esi, 2
mov edi, 1
call function
Why use the registers here when we can just push to the stack??? And in the actual function that we call:
.cfi_startproc
push rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
mov rbp, rsp
.cfi_def_cfa_register 6
sub rsp, 48
mov DWORD PTR [rbp-36], edi
mov DWORD PTR [rbp-40], esi
mov DWORD PTR [rbp-44], edx
mov rax, QWORD PTR fs:40
mov QWORD PTR [rbp-8], rax
xor eax, eax
mov rax, QWORD PTR [rbp-8]
xor rax, QWORD PTR fs:40
je .L3
call __stack_chk_fail
As far as I can see, 48 bytes are reserved for the stack frame right? And afterwards, using the registers from the function call, the arguments to the function are copied to the end of the stack. So it would look something like this:
3 <--- SP
2
1
??
??
old BP <--- BP
return Address
??
I assume the buffers are somewhere between the args and the old BP
. But I'm really not sure where exactly...since they are both only 15 bytes in total and 48 bytes where reserved...won't there be a bunch of unused space in there?
Can someone help me outline what is happening here? Is this something that is processor dependant? I'm using an intel i7.
Cheers, Brick
Upvotes: 2
Views: 1336
Reputation: 22348
There are a couple of issues. First, the 3 arguments are passed by register because that's part of the ELF ABI specification. I'm not sure where the latest (x86-64) SysV ABI document is kept these days, (x86-64.org seems defunct). Agner Fog maintains a lot of excellent documentation, including one on calling conventions.
The stack allocation is complicated by the call to __stack_check_fail
, which is added as a countermeasure to detect stack-smashing / buffer overruns. Part of the ABI also specifies that the stack must be 16-byte aligned prior to a function call. If you recompile with -fno-stack-protector
, you'll get a better idea of what's going on.
Furthermore, because the function doesn't do anything, it's not a particularly good example. It stores the arguments (needlessly), requiring 12 bytes. buffer1
and buffer2
are probably 8-byte aligned, effectively requiring 8 and 16 bytes respectively, with possibly another 4 bytes to align them. I might be wrong on this - I haven't got the spec at hand. So that's either 36 or 40 bytes. Call alignment then requires a 16-byte alignment for 48 bytes.
I think it would be more instructive to turn off the stack protection and examine the stack frame for this leaf function, and consult the x86-64 ABI spec for the alignment requirements of local variables, etc.
Upvotes: 1
Reputation: 214
It is rather compiler dependant. You can try turning off optimization or flag function with an "extern" keyword (to force using default calling convencion).
Registers are used because this is much faster than sending arguments by stack.
Upvotes: 0