Reputation: 2902
Suppose I have a function:
int sumN(int n, ...)
{
int sum = 0;
va_list vl;
va_start(vl, n);
for (int i = 0; i < n; i++)
sum += va_arg(vl, int);
va_end(vl);
return sum;
}
Called as sumN(3, 10, 20, 30);
The function is cdecl
, which means caller cleanup. So, what happens is something like:
; Push arguments right-to-left
push 30
push 20
push 10
push 3
call sumN
add esp, 16 ; Remove arguments from stack (equivalent to 4 pops)
For regular functions that take a fixed number of arguments, the callee can perform the cleanup, as part of the ret
instruction (e.g. ret 16
). That doesn't work here because the callee can't know how many arguments were pushed - I could call it as sumN(1, 10, 20, 30, 40, 50);
and cause a stack corruption.
Now, I want to do it anyway. Maybe I have a tool that parses the source code before the build and makes sure all calls are legitimate. And I'm calling sumN()
50k times in my codebase, so the extra size from the last instruction adds up.
For the above implementation, it's easily done in assembly, but if it were a printf
function or something where the logic to figure out the size is a bit more complex, that's no longer an option. Still, I could do some inline assembly or something and fix the implementation of sumN
to pop the stack. But if anyone has a better solution, that's very welcome.
The big question, however, is how to tell the compiler that the function is callee cleanup when it has ...
in its declaration? How to prevent the compiler from generating the add esp, 16
instruction?
Ideally I need this for msvc, gcc and clang, but msvc is a priority.
Related: Can stdcall have a variable arguments?
Upvotes: 0
Views: 343
Reputation: 317
What you can do is make a number of helper functions. Each helper function would take a fixed number of elements, and picking which helper function to call would be done at compile time. Then, each helper function would call your vararg function.
You will save one instruction per call, at a cost of n helper functions, where n is the maximal number of possible arguments.
Sample code:
#include <stdio.h>
#include <stdarg.h>
#include <stdint.h>
#define GET_MACRO(_1,_2,_3,NAME,...) NAME
#define func(...) GET_MACRO(__VA_ARGS__, helper3, helper2, helper1)(__VA_ARGS__)
void varargFn(int n, ...)
{
int sum = 0;
va_list vl;
va_start(vl, n);
for (int i = 0; i < n; i++)
sum += va_arg(vl, int64_t);
va_end(vl);
printf("%d\n", sum);
}
void helper1(void *v1)
{
varargFn(1, v1);
}
void helper2(void *v1, void *v2)
{
varargFn(2, v1, v2);
}
void helper3(void *v1, void *v2, void *v3)
{
varargFn(3, v1, v2, v3);
}
int main()
{
func((void *) 5);
func((void *) 5, (void *) 5);
func((void *) 5, (void *) 5, (void *) 5);
return 0;
}
And a short snippet generated from running gcc -s -Os -std=c99
helper3:
.LFB14:
.cfi_startproc
movq %rdx, %rcx
xorl %eax, %eax
movq %rsi, %rdx
movq %rdi, %rsi
movl $3, %edi
jmp varargFn
.cfi_endproc
.LFE14:
.size helper3, .-helper3
.section .text.startup,"ax",@progbits
.globl main
.type main, @function
main:
.LFB15:
.cfi_startproc
pushq %rax
.cfi_def_cfa_offset 16
movl $5, %edi
call helper1
movl $5, %esi
movl $5, %edi
call helper2
movl $5, %edx
movl $5, %esi
movl $5, %edi
call helper3
xorl %eax, %eax
popq %rdx
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE15:
.size main, .-main
You could probably squeeze a couple of more bytes from helper functions if you manage to avoid this nasty shift of n elements across registers. One idea that comes to mind is to rewrite helper3 as:
void helper3(void *v1, void *v2, void *v3)
{
varargFn(3, v2, v3, v1);
}
but then you would have to modify your varargFn, which might not be worth the trouble.
Upvotes: 2