IS4
IS4

Reputation: 13207

Jump/tailcall to another function

I have two functions, looking like this in C++:

void f1(...);
void f2(...);

I can change the body of f1, but f2 is defined in another library I cannot change. I absolutely have to (tail) call f2 inside f1, and I must pass all arguments provided to f1 to f2, but as far as I know, this is impossible in pure C or C++. There is no alternative of f2 that accepts a va_list, unfortunately. The call to f2 happens last in the function, so I need some form of tailcall.

I decided to use assembly to pop the stack frame of the current function, then jump to f2 (it is actually received as a function pointer and in a variable, so that's why I first store it in a register):

__asm {
    mov eax, f2
    leave
    jmp eax
}

In MSVC++, in Debug, it appears to work at first, but it somehow messes with the return values of other functions, and sometimes it crashes. In Release, it always crashes.

Is this assembly code incorrect, or do some optimizations of the compiler somehow break this code?

Upvotes: 1

Views: 485

Answers (2)

Peter Cordes
Peter Cordes

Reputation: 364458

You have to write f1 in pure asm for it to be guaranteed-safe.

In all the major x86 calling conventions, the callee "owns" the args, and can modify the stack-space that held them. (Whether or not the C source changes them and whether or not they're declared const).

e.g. void foo(int x) { x += 1; bar(x); } might modify the stack space above the return address that holds x, if compiled with optimization disabled. Making another call with the same args requires storing them again unless you know the callee hasn't stepped on them. The same argument applies for tailcalling from the end of one function.

I checked on the Godbolt compiler explorer; both MSVC and gcc do in fact modify x on the stack in debug builds. gcc uses add DWORD PTR [ebp+8], 1 before pushing [ebp+8].


Compilers in practice may not actually take advantage of this for variadic functions, though, so depending on the definitions of your functions, you might get away with it if you can convince them to make a tailcall.

Note that void bar(...); is not a valid prototype in C, though:

# gcc -xc on Godbolt to force compiling as C, not C++
<source>:1:10: error: ISO C requires a named argument before '...'

It is valid in C++, or at least g++ accepts it while gcc doesn't. MSVC accepts it in C++ mode, but not in C mode. (Godbolt has a whole separate C mode with a different set of compilers, which you can use to get MSVC to compile code as C instead of C++. I don't know a command-line option to flip it to C mode the way gcc has -xc and -xc++)


Anyway, It might work (in optimized builds) to write f2(); at the end of f1, but that's nasty and completely lying to the compiler about what args are passed. And of course only works for a calling convention with no register args. (But you were showing 32-bit asm, so you might well be using a calling convention with no register args.)

Any decent compiler will use jmp f2 to make an optimized tail-call in this case, because they both return void. (For non-void, you would return f2();)


BTW, if mov eax, f2 works, then jmp f2 will also work.

Your code can't work in an optimized build, though, because you're assuming that the compiler made a legacy stack-frame, and that the function won't inline anywhere.

It's unsafe even in a debug build because the compiler may have pushed some call-preserved registers that need to be popped before leaving the function (and before running leave to destroy the stack frame).


The trampoline idea that @mevets showed could maybe be simplified: if there's a reasonable fixed upper size limit on the args, you can copy maybe 64 or 128 bytes of potential-args from your incoming args into args for f1. A few SIMD vectors will do it. Then you can call f1 normally, then tail-call f2 from your asm wrapper.

If there are potentially register args, save them to stack space before the args you copy, and restore them before tailcalling.

Upvotes: 1

mevets
mevets

Reputation: 10445

The compiler will make no guarantees at the point you are digging around. A trampoline function might work, but you have to save state between them, and do a lot of digging around.

Here is a skeleton, but you will need to know a lot about calling conventions, class method invocation, etc... /

* argn, ..., arg0, retaddr */
trampoline:
    push < all volatile regs >
    call <get thread local storage >
    copy < volatile regs and ret addr > to < local storage >
    pop < volatile regs >
    remove ret addr
    call  f2
    call < get thread local storage >
    restore < volatile regs and ret addr>
    jmp f1
    ret

Upvotes: 2

Related Questions