Reputation: 1410
I'm trying to follow along with the author of my book, and he gives us the example function with a prologue and epilogue (no local variables in function)
1: push ebp
2: mov ebp, esp
3: ...
4: movsx eax, word ptr [ebp+8]
5: movsx eax, word ptr [ebp+0Ch]
6: add eax, ecx
7: ...
8: mov esp, ebp
9: pop ebp
10: retn
that is invoked by
push eax ; param 2
push ecx ; param 1
call addme
add esp, 8 ; cleanup stack
In this example, is line 8 not a redundant instruction? I mean, is EBP
not already equal to ESP
in this context? Nothing has been PUSH
or POP
in the stack since.
My assumption is that this line would only be necessary if we had local variables that were pushed onto the stack, and this would be a method to clear the stack of those local variables?
I would just like to clarify that this is the case
Upvotes: 2
Views: 4360
Reputation: 364210
You're correct, it's redundant if you know that esp
is already pointing at the location where you pushed your caller's ebp
.
When gcc compiles a function with -fno-omit-frame-pointer
, it does in fact do the optimization you suggest of just popping ebp
when it knows that esp
is already pointing in the right place.
This is very common in functions that use call-preserved registers (like ebx
) which also have to be saved/restored like ebp
. Compilers typically do all the saves/restores in the prologue/epilogue before anything like reserving space for a C99 variable-size array. So pop ebx
will always leave esp
pointing to the right place for pop ebp
.
e.g. clang 3.8's output (with -O3 -m32
) for this function, on the Godbolt compiler explorer. As is common, compilers don't quite make optimal code:
void extint(int); // a function that can't inline because the compiler can't see the definition.
int save_reg_framepointer(int a){
extint(a);
return a;
}
# clang3.8
push ebp
mov ebp, esp # stack-frame boilerplate
push esi # save a call-preserved reg
push eax # align the stack to 16B
mov esi, dword ptr [ebp + 8] # load `a` into a register that will survive the function call.
mov dword ptr [esp], esi # store the arg for extint. Doing this with an ebp-relative address would have been slightly more efficient, but just push esi here instead of push eax earlier would make even more sense
call extint
mov eax, esi # return value
add esp, 4 # pop the arg
pop esi # restore esi
pop ebp # restore ebp. Notice the lack of a mov esp, ebp here, or even a lea esp, [ebp-4] before the first pop.
ret
Of course, a human (borrowing a trick from gcc)
# hand-written based on tricks from gcc and clang, and avoiding their suckage
call_non_inline_and_return_arg:
push ebp
mov ebp, esp # stack-frame boilerplate if we have to.
push esi # save a call-preserved reg
mov esi, dword [ebp + 8] # load `a` into a register that will survive the function call
push esi # replacing push eax / mov
call extint
mov eax, esi # return value. Could mov eax, [ebp+8]
mov esi, [ebp-4] # restore esi without a pop, since we know where we put it, and esp isn't pointing there.
leave # same as mov esp, ebp / pop ebp. 3 uops on recent Intel CPUs
ret
Since the stack needs to be aligned by 16 before a call
(according to the rules of the SystemV i386 ABI, see links in the x86 tag wiki), we might as well save/restore an extra reg, instead of just push [ebp+8]
and then (after the call) mov eax, [ebp+8]
. Compilers favour saving/restoring call-preserved registers over reloading local data multiple times.
If not for the stack-alignment rules in the current version of the ABI, I might write:
# hand-written: esp alignment not preserved on the call
call_no_stack_align:
push ebp
mov ebp, esp # stack-frame boilerplate if we have to.
push dword [ebp + 8] # function arg. 2 uops for push with a memory operand
call extint # esp is offset by 12 from before the `call` that called us: return address, ebp, and function arg.
mov eax, [ebp+8] # return value, which extint won't have modified because it only takes one arg
leave # same as mov esp, ebp / pop ebp. 3 uops on recent Intel CPUs
ret
gcc will actually use leave
instead of mov / pop, in cases where it does need to modify esp
before popping ebx
. For example, flip Godbolt to gcc (instead of clang), and take out -m32
so we're compiling for x86-64 (where args are passed in registers). This means there's no need to pop args off the stack after a call, so rsp
is set correctly to just pop two regs. (push/pop use 8 bytes of stack, but rsp
still has to be 16B-aligned before a call
in the SysV AMD64 ABI, so gcc actually does a sub rsp, 8
and corresponding add
around the call
.)
Another missed optimization: with gcc -m32
, the variable-length-array function uses an add esp, 16
/ leave
after the call. The add
is totally useless. (Add -m32 to the gcc args on godbolt).
Upvotes: 6
Reputation: 5980
You don't know what is on lines 3 and 7. So I assume line 8 is not redundant in general case. Normally it should work without that line 8, because the value of ESP at the end of your function will normally be the same as at the start of your funcion. But I can imagine some dirty scenarios where line 8 is used to clean something, like for example if you do push-call sequence and omit the final ADD ESP,n line. Then you can simply use MOV ESP,EBP to fix the ESP at the end of your function. Dirty but working.
Upvotes: 3