mtijanic
mtijanic

Reputation: 2902

Make a variable argument function callee cleanup

Suppose I have a function:

int sumN(int n, ...)
{
    int sum = 0;
    va_list vl;
    va_start(vl, n);
    for (int i = 0; i < n; i++)
        sum += va_arg(vl, int);

    va_end(vl);
    return sum;
}

Called as sumN(3, 10, 20, 30); The function is cdecl, which means caller cleanup. So, what happens is something like:

; Push arguments right-to-left
push 30
push 20
push 10
push 3
call sumN
add esp, 16 ; Remove arguments from stack (equivalent to 4 pops)

For regular functions that take a fixed number of arguments, the callee can perform the cleanup, as part of the ret instruction (e.g. ret 16). That doesn't work here because the callee can't know how many arguments were pushed - I could call it as sumN(1, 10, 20, 30, 40, 50); and cause a stack corruption.

Now, I want to do it anyway. Maybe I have a tool that parses the source code before the build and makes sure all calls are legitimate. And I'm calling sumN() 50k times in my codebase, so the extra size from the last instruction adds up.

For the above implementation, it's easily done in assembly, but if it were a printf function or something where the logic to figure out the size is a bit more complex, that's no longer an option. Still, I could do some inline assembly or something and fix the implementation of sumN to pop the stack. But if anyone has a better solution, that's very welcome.

The big question, however, is how to tell the compiler that the function is callee cleanup when it has ... in its declaration? How to prevent the compiler from generating the add esp, 16 instruction?

Ideally I need this for msvc, gcc and clang, but msvc is a priority.

Related: Can stdcall have a variable arguments?

Upvotes: 0

Views: 343

Answers (1)

vguberinic
vguberinic

Reputation: 317

What you can do is make a number of helper functions. Each helper function would take a fixed number of elements, and picking which helper function to call would be done at compile time. Then, each helper function would call your vararg function.

You will save one instruction per call, at a cost of n helper functions, where n is the maximal number of possible arguments.

Sample code:

#include <stdio.h>
#include <stdarg.h>
#include <stdint.h>

#define GET_MACRO(_1,_2,_3,NAME,...) NAME
#define func(...) GET_MACRO(__VA_ARGS__, helper3, helper2, helper1)(__VA_ARGS__)

void varargFn(int n, ...)
{
        int sum = 0;
        va_list vl;
        va_start(vl, n);
        for (int i = 0; i < n; i++)
                sum += va_arg(vl, int64_t);

        va_end(vl);
        printf("%d\n", sum);
}

void helper1(void *v1)
{
        varargFn(1, v1);
}

void helper2(void *v1, void *v2)
{
        varargFn(2, v1, v2);
}

void helper3(void *v1, void *v2, void *v3)
{
        varargFn(3, v1, v2, v3);
}

int main()
{
        func((void *) 5);
        func((void *) 5, (void *) 5);
        func((void *) 5, (void *) 5, (void *) 5);

        return 0;
}

And a short snippet generated from running gcc -s -Os -std=c99

helper3:
.LFB14:
        .cfi_startproc
        movq    %rdx, %rcx
        xorl    %eax, %eax
        movq    %rsi, %rdx
        movq    %rdi, %rsi
        movl    $3, %edi
        jmp     varargFn
        .cfi_endproc
.LFE14:
        .size   helper3, .-helper3
        .section        .text.startup,"ax",@progbits
        .globl  main
        .type   main, @function
main:
.LFB15:
        .cfi_startproc
        pushq   %rax
        .cfi_def_cfa_offset 16
        movl    $5, %edi
        call    helper1
        movl    $5, %esi
        movl    $5, %edi
        call    helper2
        movl    $5, %edx
        movl    $5, %esi
        movl    $5, %edi
        call    helper3
        xorl    %eax, %eax
        popq    %rdx
        .cfi_def_cfa_offset 8
        ret
        .cfi_endproc
.LFE15:
        .size   main, .-main

You could probably squeeze a couple of more bytes from helper functions if you manage to avoid this nasty shift of n elements across registers. One idea that comes to mind is to rewrite helper3 as:

void helper3(void *v1, void *v2, void *v3)
{
    varargFn(3, v2, v3, v1);
}

but then you would have to modify your varargFn, which might not be worth the trouble.

Upvotes: 2

Related Questions