X33
X33

Reputation: 1410

Function epilogue in assembly

I'm trying to follow along with the author of my book, and he gives us the example function with a prologue and epilogue (no local variables in function)

1:    push ebp
2:    mov ebp, esp
3:    ...
4:    movsx eax, word ptr [ebp+8]
5:    movsx eax, word ptr [ebp+0Ch]
6:    add eax, ecx
7:    ...
8:    mov esp, ebp
9:    pop ebp
10:   retn

that is invoked by

push eax     ; param 2
push ecx     ; param 1
call addme
add esp, 8   ; cleanup stack

In this example, is line 8 not a redundant instruction? I mean, is EBP not already equal to ESP in this context? Nothing has been PUSH or POP in the stack since.

My assumption is that this line would only be necessary if we had local variables that were pushed onto the stack, and this would be a method to clear the stack of those local variables?

I would just like to clarify that this is the case

Upvotes: 2

Views: 4360

Answers (2)

Peter Cordes
Peter Cordes

Reputation: 364210

You're correct, it's redundant if you know that esp is already pointing at the location where you pushed your caller's ebp.


When gcc compiles a function with -fno-omit-frame-pointer, it does in fact do the optimization you suggest of just popping ebp when it knows that esp is already pointing in the right place.

This is very common in functions that use call-preserved registers (like ebx) which also have to be saved/restored like ebp. Compilers typically do all the saves/restores in the prologue/epilogue before anything like reserving space for a C99 variable-size array. So pop ebx will always leave esp pointing to the right place for pop ebp.

e.g. clang 3.8's output (with -O3 -m32) for this function, on the Godbolt compiler explorer. As is common, compilers don't quite make optimal code:

void extint(int);   // a function that can't inline because the compiler can't see the definition.
int save_reg_framepointer(int a){
  extint(a);
  return a;
}

    # clang3.8
    push    ebp
    mov     ebp, esp                     # stack-frame boilerplate
    push    esi                          # save a call-preserved reg
    push    eax                          # align the stack to 16B
    mov     esi, dword ptr [ebp + 8]     # load `a` into a register that will survive the function call.
    mov     dword ptr [esp], esi         # store the arg for extint.  Doing this with an ebp-relative address would have been slightly more efficient, but just push esi here instead of push eax earlier would make even more sense
    call    extint
    mov     eax, esi                     # return value
    add     esp, 4                       # pop the arg
    pop     esi                          # restore esi
    pop     ebp                          # restore ebp.  Notice the lack of a mov  esp, ebp here, or even a  lea esp, [ebp-4]  before the first pop.
    ret

Of course, a human (borrowing a trick from gcc)

# hand-written based on tricks from gcc and clang, and avoiding their suckage
call_non_inline_and_return_arg:
    push    ebp
    mov     ebp, esp                     # stack-frame boilerplate if we have to.
    push    esi                          # save a call-preserved reg
    mov     esi, dword [ebp + 8]         # load `a` into a register that will survive the function call
    push    esi                          # replacing push eax / mov
    call    extint
    mov     eax, esi                     # return value.  Could  mov eax, [ebp+8]
    mov     esi, [ebp-4]                 # restore esi without a pop, since we know where we put it, and esp isn't pointing there.
    leave                                # same as mov esp, ebp / pop ebp.  3 uops on recent Intel CPUs
    ret

Since the stack needs to be aligned by 16 before a call (according to the rules of the SystemV i386 ABI, see links in the tag wiki), we might as well save/restore an extra reg, instead of just push [ebp+8] and then (after the call) mov eax, [ebp+8]. Compilers favour saving/restoring call-preserved registers over reloading local data multiple times.

If not for the stack-alignment rules in the current version of the ABI, I might write:

# hand-written: esp alignment not preserved on the call
call_no_stack_align:
    push    ebp
    mov     ebp, esp                     # stack-frame boilerplate if we have to.
    push    dword [ebp + 8]              # function arg.  2 uops for push with a memory operand
    call    extint                       # esp is offset by 12 from before the `call` that called us: return address, ebp, and function arg.
    mov     eax, [ebp+8]                 # return value, which extint won't have modified because it only takes one arg
    leave                                # same as mov esp, ebp / pop ebp.  3 uops on recent Intel CPUs
    ret

gcc will actually use leave instead of mov / pop, in cases where it does need to modify esp before popping ebx. For example, flip Godbolt to gcc (instead of clang), and take out -m32 so we're compiling for x86-64 (where args are passed in registers). This means there's no need to pop args off the stack after a call, so rsp is set correctly to just pop two regs. (push/pop use 8 bytes of stack, but rsp still has to be 16B-aligned before a call in the SysV AMD64 ABI, so gcc actually does a sub rsp, 8 and corresponding add around the call.)

Another missed optimization: with gcc -m32, the variable-length-array function uses an add esp, 16 / leave after the call. The add is totally useless. (Add -m32 to the gcc args on godbolt).

Upvotes: 6

Al Kepp
Al Kepp

Reputation: 5980

You don't know what is on lines 3 and 7. So I assume line 8 is not redundant in general case. Normally it should work without that line 8, because the value of ESP at the end of your function will normally be the same as at the start of your funcion. But I can imagine some dirty scenarios where line 8 is used to clean something, like for example if you do push-call sequence and omit the final ADD ESP,n line. Then you can simply use MOV ESP,EBP to fix the ESP at the end of your function. Dirty but working.

Upvotes: 3

Related Questions