My code looks like this:
template<typename F>
void printHello(F f)
int main() {
std::string buf;
printHello([&buf](const char*msg) { buf += msg; });
printHello([&buf]() { });
The question is - how can I restrict the F
type to accept only lambdas that have a signature void(const char*)
, so that the second call to printHello
doesn't fail at some obscure place inside printHello
but instead on the line that calls printHello
I know that std::function
can solve it in this particular case (is what I'd use if I really wanted to print 'hello'). But std::function
is really something else and comes at a cost (however small that cost is, as of today, April 2016, GCC and MSVC cannot optimize away the virtual call). So my question can be seen as purely academic - is there a "template" way to solve it?
Well... Just use SFINAE
template<typename T>
auto printHello(T f) -> void_t<decltype(f(std::declval<const char*>()))> {
And void_t
is implemented as:
using void_t = void;
The return type will act as a constraint on parameter sent to your function. If the expression inside the decltype
cannot be evaluated, it will result in an error.
unless you're using an ancient standard library, std::function
will have optimisations for small function objects (of which yours is one). You will see no performance reduction whatsoever.
People who tell you not to use std::function
because of performance reasons are the very same people who 'optimise' code before measuring performance bottlenecks.
Write the code that expresses intent. IF it becomes a performance bottleneck (it won't) then look at changing it.
I once worked on a financial forwards pricing system. Someone decided that it ran too slowly (64 cores, multiple server boxes, hundreds of thousands of discrete algorithms running in parallel in a massive DAG). So we profiled it.
What did we find?
The processing took almost no time at all. The program spent 99% of its time converting doubles to strings and strings to doubles at the boundaries of the IO, where we had to communicate with a message bus.
Using a lambda in place of a std::function
for the callbacks would have made no difference whatsoever.
Write elegant code. Express your intent clearly. Compile with optimisations. Marvel as the compiler does its job and turns your 100 lines of c++ into 5 machine code instructions.
A simple demonstration:
#include <functional>
// external function forces an actual function call
extern void other_func(const char* p);
// try to force std::function to call polymorphically
void test_it(const std::function<void(const char*)>& f, const char* p)
int main()
// make our function object
auto f = std::function<void(const char*)>([](const char* p) { other_func(p); });
const char* const data[] = {
// call it in a tight loop
for(auto p : data) {
test_it(f, p);
compile with apple clang, -O2:
.globl _main
.align 4, 0x90
_main: ## @main
.cfi_personality 155, ___gxx_personality_v0
.cfi_lsda 16, Lexception1
## BB#0: ## %_ZNKSt3__18functionIFvPKcEEclES2_.exit.i
# the normal stack frame stuff...
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
pushq %r15
pushq %r14
pushq %rbx
subq $72, %rsp
.cfi_offset %rbx, -40
.cfi_offset %r14, -32
.cfi_offset %r15, -24
movq ___stack_chk_guard@GOTPCREL(%rip), %rbx
movq (%rbx), %rbx
movq %rbx, -32(%rbp)
leaq -80(%rbp), %r15
movq %r15, -48(%rbp)
# take the address of std::function's vtable... we'll need it (once)
leaq __ZTVNSt3__110__function6__funcIZ4mainE3$_0NS_9allocatorIS2_EEFvPKcEEE+16(%rip), %rax
# here's the tight loop...
movq %rax, -80(%rbp)
leaq L_.str(%rip), %rdi
movq %rdi, -88(%rbp)
# oh look! std::function's call has been TOTALLY INLINED!!
callq __Z10other_funcPKc
LBB1_2: ## %_ZNSt3__110__function6__funcIZ4mainE3$_0NS_9allocatorIS2_EEFvPKcEEclEOS6_.exit
## =>This Inner Loop Header: Depth=1
# notice that the loop itself uses more instructions than the call??
leaq L_.str1(%rip), %rax
movq %rax, -88(%rbp)
movq -48(%rbp), %rdi
testq %rdi, %rdi
je LBB1_1
## BB#3: ## %_ZNKSt3__18functionIFvPKcEEclES2_.exit.i.1
## in Loop: Header=BB1_2 Depth=1
# destructor called once (constant time, therefore irrelevant)
movq (%rdi), %rax
movq 48(%rax), %rax
leaq -88(%rbp), %rsi
callq *%rax
## BB#4: ## in Loop: Header=BB1_2 Depth=1
leaq L_.str2(%rip), %rax
movq %rax, -88(%rbp)
movq -48(%rbp), %rdi
testq %rdi, %rdi
jne LBB1_5
# the rest of this function is exception handling. Executed at most
# once, in exceptional circumstances. Therefore, irrelevant.
LBB1_1: ## in Loop: Header=BB1_2 Depth=1
movl $8, %edi
callq ___cxa_allocate_exception
movq __ZTVNSt3__117bad_function_callE@GOTPCREL(%rip), %rcx
addq $16, %rcx
movq %rcx, (%rax)
movq __ZTINSt3__117bad_function_callE@GOTPCREL(%rip), %rsi
movq __ZNSt3__117bad_function_callD1Ev@GOTPCREL(%rip), %rdx
movq %rax, %rdi
callq ___cxa_throw
jmp LBB1_2
LBB1_9: ## %.loopexit.split-lp
jmp LBB1_10
LBB1_5: ## %_ZNKSt3__18functionIFvPKcEEclES2_.exit.i.2
movq (%rdi), %rax
movq 48(%rax), %rax
leaq -88(%rbp), %rsi
callq *%rax
## BB#6:
movq -48(%rbp), %rdi
cmpq %r15, %rdi
je LBB1_7
## BB#15:
testq %rdi, %rdi
je LBB1_17
## BB#16:
movq (%rdi), %rax
callq *40(%rax)
jmp LBB1_17
movq -80(%rbp), %rax
leaq -80(%rbp), %rdi
callq *32(%rax)
LBB1_17: ## %_ZNSt3__18functionIFvPKcEED1Ev.exit
cmpq -32(%rbp), %rbx
jne LBB1_19
## BB#18: ## %_ZNSt3__18functionIFvPKcEED1Ev.exit
xorl %eax, %eax
addq $72, %rsp
popq %rbx
popq %r14
popq %r15
popq %rbp
Can we stop arguing about performance now please?
You can add compile time checking for template parameters by defining constraints.
This'll allow to catch such errors early and you also won't have runtime overhead as no code is generated for a constraint using current compilers.
For example we can define such constraint:
template<class F, class T> struct CanCall
static void constraints(F f, T a) { f(a); }
CanCall() { void(*p)(F, T) = constraints; }
CanCall checks (at compile time) that a F can be called with T.
template<typename F>
void printHello(F f)
CanCall<F, const char*>();
As a result compilers also give readable error messages for a failed constraint.
Upvotes: 1