PCaetano
PCaetano

Reputation: 601

C++ Compiler optimization on a function template that returns its argument

Using Boost Locale/ICU, I've created a solution for the problems I had outputting non-ASCII characters to the Windows console (cmd) when using Mingw.

Now, I've decided to give it a go using Visual Studio, only to find out that using std::locale::global(std::local("")) will result in correct non-ASCII output on cmd, so there's no need for my solution.

For now, that code #errors on VS, but I'd like it to be more portable, namely, use that code on both VS and Mingw, but have it do nothing on VS.

The obvious solution is the preprocessor, something like this (simplified, I'm leaving out stuff like do {} while(0)):

#if defined(_MSC_VER)
#define SOME_HOPEFULLY_UNIQUE_PREFIX_CONVERT_OUTPUT(x) x
#else
#define SOME_HOPEFULLY_UNIQUE_PREFIX_CONVERT_OUTPUT(x) ConvertOuput(x)
#endif

Then I wondered if I could achieve the same result with a function template that just returned its argument. Something like this:

template <typename T>
T ConvertOutput(T t)
{
    return t;
}

A simple test with char* went as expected (x64 Release configuration on MSVC Community 2015), the call to ConvertOutput() was elided:

lea  rdx,[string "teste" (013F79761Ch)]  
mov  rcx,qword ptr [__imp_std::cout (013F797178h)]  
call std::operator<<<std::char_traits<char> > (013F791690h)  
mov  dl,0Ah  
mov  rcx,rax  
call std::operator<<<std::char_traits<char> > (013F791870h)  

But the same simple test with std::string shows that while we get RVO, there's still a temporary being built and a call to ConvertOutput():

124: std::string b1{"teste2"};
mov  qword ptr [rsp+88h],0Fh  
mov  qword ptr [rsp+80h],0  
mov  byte ptr [b1],0  
mov  r8d,6  
lea  rdx,[string "teste2" (013F127624h)]  
lea  rcx,[b1]  
call std::basic_string<char,std::char_traits<char>,
         std::allocator<char> >::assign (013F1210C0h)  
nop  

125: auto b2 = ConvertOutput(b1);
mov  qword ptr [rsp+38h],0Fh  
mov  qword ptr [rsp+30h],0  
mov  byte ptr [rsp+20h],0  
or   r9,0FFFFFFFFFFFFFFFFh  
xor  r8d,r8d  
lea  rdx,[b1]  
lea  rcx,[rsp+20h]  
call std::basic_string<char,std::char_traits<char>,
         std::allocator<char> >::assign (013F1211F0h)  
lea  rdx,[rsp+20h]  
lea  rcx,[b2]  
call ConvertOutput<std::basic_string<char,std::char_traits<char>,
         std::allocator<char> > > (013F124090h)  
nop  

I had some hope that the compiler, having the knowledge that all ConvertOuput() does is return its argument, could also elide it here. I realize that might be unwise, because the copy ctor for an arbitrary T could have some desired side-effect (?), but since the instantiation occurs with std::string, I expected the compiler to have more wiggle room with std classes.

Specializing ConvertOutput() for std::string gave a similar result - the temporary goes away if ConvertOutput() takes a reference, but the call is still there.

As a final attempt, I overloaded ConvertOutput() like this:

template <typename CharT>
CharT const* ConvertOutput(std::basic_string<CharT> const &t)
{
    cout << "Ref: " << t << '\n';
    return t.c_str();
}

And I finally got the behaviour I expected, including the elision/inlining of the ConvertOutput() call:

132: std::string b1{"teste2"};
mov    qword ptr [rsp+40h],0Fh  
mov    qword ptr [rsp+38h],0  
mov    byte ptr [b1],0  
mov    r8d,6  
lea    rdx,[string "teste2" (013F697624h)]  
lea    rcx,[b1]  
call   std::basic_string<char,std::char_traits<char>,
           std::allocator<char> >::assign (013F6910C0h)  
nop  

133: auto b2 = ConvertOutput(b1);
lea    rdx,[string "Ref: " (013F697730h)]  
mov    rcx,qword ptr [__imp_std::cout (013F697178h)]  
call   std::operator<<<std::char_traits<char> > (013F691690h)  
mov    rcx,rax  
lea    rdx,[b1]  
call   std::operator<<<char,std::char_traits<char>,
           std::allocator<char> > (013F691A30h)  
mov    rcx,rax  
mov    dl,0Ah  
call   std::operator<<<std::char_traits<char> > (013F691870h)  
lea    rdx,[b1]  
cmp    qword ptr [rsp+40h],10h  
cmovae rdx,qword ptr [b1]  

I can see no way to achieve the same effect as the preprocessor macro with a template, at least not without a sizable number of caveats.

Am I wrong? Is there a way (simple or otherwise) to achieve this with templates, without overloading/specializing for each used type?

Upvotes: 1

Views: 76

Answers (1)

Jarod42
Jarod42

Reputation: 217085

How about std::forward:

template <typename T>
T&& ConvertOutput(T&& t)
{
    return std::forward<T>(t);
}

Upvotes: 4

Related Questions