rustyx
rustyx

Reputation: 85530

How to restrict template functor return and parameter types

My code looks like this:

template<typename F>
void printHello(F f)
{
    f("Hello!");
}

int main() {
    std::string buf;
    printHello([&buf](const char*msg) { buf += msg; });
    printHello([&buf]() { });
}

The question is - how can I restrict the F type to accept only lambdas that have a signature void(const char*), so that the second call to printHello doesn't fail at some obscure place inside printHello but instead on the line that calls printHello incorrectly?

==EDIT==

I know that std::function can solve it in this particular case (is what I'd use if I really wanted to print 'hello'). But std::function is really something else and comes at a cost (however small that cost is, as of today, April 2016, GCC and MSVC cannot optimize away the virtual call). So my question can be seen as purely academic - is there a "template" way to solve it?

Upvotes: 3

Views: 370

Answers (3)

Guillaume Racicot
Guillaume Racicot

Reputation: 41840

Well... Just use SFINAE

template<typename T>
auto printHello(T f) -> void_t<decltype(f(std::declval<const char*>()))> {
    f("hello");
}

And void_t is implemented as:

template<typename...>
using void_t = void;

The return type will act as a constraint on parameter sent to your function. If the expression inside the decltype cannot be evaluated, it will result in an error.

Upvotes: 1

Richard Hodges
Richard Hodges

Reputation: 69922

unless you're using an ancient standard library, std::function will have optimisations for small function objects (of which yours is one). You will see no performance reduction whatsoever.

People who tell you not to use std::function because of performance reasons are the very same people who 'optimise' code before measuring performance bottlenecks.

Write the code that expresses intent. IF it becomes a performance bottleneck (it won't) then look at changing it.

I once worked on a financial forwards pricing system. Someone decided that it ran too slowly (64 cores, multiple server boxes, hundreds of thousands of discrete algorithms running in parallel in a massive DAG). So we profiled it.

What did we find?

The processing took almost no time at all. The program spent 99% of its time converting doubles to strings and strings to doubles at the boundaries of the IO, where we had to communicate with a message bus.

Using a lambda in place of a std::function for the callbacks would have made no difference whatsoever.

Write elegant code. Express your intent clearly. Compile with optimisations. Marvel as the compiler does its job and turns your 100 lines of c++ into 5 machine code instructions.

A simple demonstration:

#include <functional>

// external function forces an actual function call
extern void other_func(const char* p);

// try to force std::function to call polymorphically    
void test_it(const std::function<void(const char*)>& f, const char* p)
{
    f(p);
}

int main()
{
    // make our function object
    auto f = std::function<void(const char*)>([](const char* p) { other_func(p); });

    const char* const data[] = {
        "foo",
        "bar",
        "baz"
    };

    // call it in a tight loop
    for(auto p : data) {
        test_it(f, p);
    }
}

compile with apple clang, -O2:

result:

    .globl  _main
    .align  4, 0x90
_main:                                  ## @main
Lfunc_begin1:
    .cfi_startproc
    .cfi_personality 155, ___gxx_personality_v0
    .cfi_lsda 16, Lexception1
## BB#0:                                ## %_ZNKSt3__18functionIFvPKcEEclES2_.exit.i

#
# the normal stack frame stuff...
#
    pushq   %rbp
Ltmp13:
    .cfi_def_cfa_offset 16
Ltmp14:
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
Ltmp15:
    .cfi_def_cfa_register %rbp
    pushq   %r15
    pushq   %r14
    pushq   %rbx
    subq    $72, %rsp
Ltmp16:
    .cfi_offset %rbx, -40
Ltmp17:
    .cfi_offset %r14, -32
Ltmp18:
    .cfi_offset %r15, -24
    movq    ___stack_chk_guard@GOTPCREL(%rip), %rbx
    movq    (%rbx), %rbx
    movq    %rbx, -32(%rbp)
    leaq    -80(%rbp), %r15
    movq    %r15, -48(%rbp)
#
# take the address of std::function's vtable... we'll need it (once)
#
    leaq    __ZTVNSt3__110__function6__funcIZ4mainE3$_0NS_9allocatorIS2_EEFvPKcEEE+16(%rip), %rax
#
# here's the tight loop...
#
    movq    %rax, -80(%rbp)
    leaq    L_.str(%rip), %rdi
    movq    %rdi, -88(%rbp)
Ltmp3:
#
# oh look! std::function's call has been TOTALLY INLINED!!
#
    callq   __Z10other_funcPKc
Ltmp4:
LBB1_2:                                 ## %_ZNSt3__110__function6__funcIZ4mainE3$_0NS_9allocatorIS2_EEFvPKcEEclEOS6_.exit
                                        ## =>This Inner Loop Header: Depth=1
#
# notice that the loop itself uses more instructions than the call??
#

    leaq    L_.str1(%rip), %rax
    movq    %rax, -88(%rbp)
    movq    -48(%rbp), %rdi
    testq   %rdi, %rdi
    je  LBB1_1
## BB#3:                                ## %_ZNKSt3__18functionIFvPKcEEclES2_.exit.i.1
                                        ##   in Loop: Header=BB1_2 Depth=1
#
# destructor called once (constant time, therefore irrelevant)
#
    movq    (%rdi), %rax
    movq    48(%rax), %rax
Ltmp5:
    leaq    -88(%rbp), %rsi
    callq   *%rax
Ltmp6:
## BB#4:                                ##   in Loop: Header=BB1_2 Depth=1
    leaq    L_.str2(%rip), %rax
    movq    %rax, -88(%rbp)
    movq    -48(%rbp), %rdi
    testq   %rdi, %rdi
    jne LBB1_5
#
# the rest of this function is exception handling. Executed at most 
# once, in exceptional circumstances. Therefore, irrelevant.
#
LBB1_1:                                 ##   in Loop: Header=BB1_2 Depth=1
    movl    $8, %edi
    callq   ___cxa_allocate_exception
    movq    __ZTVNSt3__117bad_function_callE@GOTPCREL(%rip), %rcx
    addq    $16, %rcx
    movq    %rcx, (%rax)
Ltmp10:
    movq    __ZTINSt3__117bad_function_callE@GOTPCREL(%rip), %rsi
    movq    __ZNSt3__117bad_function_callD1Ev@GOTPCREL(%rip), %rdx
    movq    %rax, %rdi
    callq   ___cxa_throw
Ltmp11:
    jmp LBB1_2
LBB1_9:                                 ## %.loopexit.split-lp
Ltmp12:
    jmp LBB1_10
LBB1_5:                                 ## %_ZNKSt3__18functionIFvPKcEEclES2_.exit.i.2
    movq    (%rdi), %rax
    movq    48(%rax), %rax
Ltmp7:
    leaq    -88(%rbp), %rsi
    callq   *%rax
Ltmp8:
## BB#6:
    movq    -48(%rbp), %rdi
    cmpq    %r15, %rdi
    je  LBB1_7
## BB#15:
    testq   %rdi, %rdi
    je  LBB1_17
## BB#16:
    movq    (%rdi), %rax
    callq   *40(%rax)
    jmp LBB1_17
LBB1_7:
    movq    -80(%rbp), %rax
    leaq    -80(%rbp), %rdi
    callq   *32(%rax)
LBB1_17:                                ## %_ZNSt3__18functionIFvPKcEED1Ev.exit
    cmpq    -32(%rbp), %rbx
    jne LBB1_19
## BB#18:                               ## %_ZNSt3__18functionIFvPKcEED1Ev.exit
    xorl    %eax, %eax
    addq    $72, %rsp
    popq    %rbx
    popq    %r14
    popq    %r15
    popq    %rbp
    retq

Can we stop arguing about performance now please?

Upvotes: 5

angelvet
angelvet

Reputation: 59

You can add compile time checking for template parameters by defining constraints.

This'll allow to catch such errors early and you also won't have runtime overhead as no code is generated for a constraint using current compilers.

For example we can define such constraint:

template<class F, class T> struct CanCall 
{
  static void constraints(F f, T a) { f(a); }
  CanCall() { void(*p)(F, T) = constraints; }
};

CanCall checks (at compile time) that a F can be called with T.

Usage:

template<typename F>
void printHello(F f)
{
  CanCall<F, const char*>();

  f("Hello!");
}

As a result compilers also give readable error messages for a failed constraint.

Upvotes: 1

Related Questions