akakatak
akakatak

Reputation: 595

GCC can't optimize delegated lambda function involving member function pointers

I compiled following c++14 code with GCC4.9.2 and clang3.6.0. I used -O3 flag.

#include <utility>

struct S
{
    int a;
    int A () const { return a; }
};

template <class F, class ... Args>
int Func (F && f, Args && ... args)
{
    return f(std::forward<Args>(args) ...);
}

using PtrA = int (S::*)() const;

int F (S const & s, PtrA ptr) { return (s.*ptr)() * 5; }

int p (S const & s) { return s.A() * 5; }

int P1 (S const & s) { return Func(&F, s , &S::A); }
int P2 (S const & s) { return Func([](S const & s, auto f) { return (s.*f)() * 5; }, s, &S::A); }
int P3 (S const & s) { return ([](S const & s, auto f) { return (s.*f)() * 5; })(s, &S::A); }
int P4 (S const & s) { return Func([](S const & s) { return s.A() * 5; }, s); }

Func calls delegated function object with passing arguments to the function object.

P1, P2, P3 and P4 works same as p with different ways. P1 delegates F which is the function involving member function pointer (PtrA). P2 delegates the lambda function which works same as F. P3 calls the lambda function directly. P4 calls the other lambda function without using member function pointer.

objdump of this code with GCC is

// p, P1, P3, P4
mov    (%rdi),%eax
lea    (%rax,%rax,4),%eax
retq   


// P2
sub    $0x8,%rsp
callq  49 <_Z2P2RK1S+0x9> // this address points to "add" in the next line.
add    $0x8,%rsp
lea    (%rax,%rax,4),%eax
retq

For p, P1, P3, P4, GCC outputs good code. Oddly, for P2, GCC outputs bad code apparently.

On the other hand, clang outputs

// p, P1, P2, P3, P4
imul   $0x5,(%rdi),%eax
retq

Although this is no so good, outputs are same for all functions at least.

My question is whether there is the right reason that GCC can't optimize this i.e. there is the difference as C++ program between p and P1 - P4.

If the answer to the first question is NO, is this known bug in GCC optimizer?

This question is motivated by a Japanese article

Upvotes: 4

Views: 155

Answers (1)

user703016
user703016

Reputation: 37975

It seems that GCC 5 and onwards is in fact capable of that optimization, see gcc.godbolt.org:

P2(S const&):
    mov eax, DWORD PTR [rdi]
    lea eax, [rax+rax*4]
    ret

Upvotes: 2

Related Questions