Carucel
Carucel

Reputation: 481

A custom operator overload in C++ without overhead

I want to be able to define a custom operator that works like this:

struct Matrix {
    int inner[5];
    static Matrix tensor_op(Matrix const& a, Matrix const& b);
};

int main()
{
    Matrix a;
    Matrix b;
    Matrix c = a tensor c;
    return 1;
}

The following code works perfectly, except that it does not optimize the intermediate object away:

template<char op> struct extendedop { }; 
template<typename T, char op>
struct intermediate {
    T const& a;
    intermediate(T const& a_) : a(a_) {}
}; 
template<typename T, char op>
intermediate<T, op> operator+(T const& a, extendedop<op> exop) {
    return intermediate<T, op>(a);
}
template<typename T>
T operator+(intermediate<T, '*'> const& a, T const& b) {
    return T::tensor_op(a.a, b);
}
#define tensor + extendedop<'*'>() +

As you can see in the decompiled assembly code when compiled with GCC and MSVC, only GCC can optimize the intermediate object away.

How can I make MSVC optimize away the unnecessary code?

Upvotes: 1

Views: 94

Answers (1)

Joel Filho
Joel Filho

Reputation: 1300

It is optimized by MSVC. Look at the generated code for main() (Using /GS- to remove the security check, just to make it clearer):

$LN10:
  sub rsp, 120 ; 00000078H
  lea r8, QWORD PTR c$[rsp]
  lea rdx, QWORD PTR a$[rsp]
  lea rcx, QWORD PTR $T1[rsp]
  call static Matrix Matrix::tensor_op(Matrix const &,Matrix const &) ; Matrix::tensor_op
  mov eax, 1
  add rsp, 120 ; 00000078H
  ret 0
main ENDP

That's the same number of instructions as GCC. The construction of the intermediate object is elided.

The extra assembly generated on Compiler Explorer is just MSVC adding the code for the operators, in case they're used independently, and are not cleared up when not used. It's probably just a side-effect of it not having direct Assembly output as GCC/Clang do.

Upvotes: 1

Related Questions