Reputation: 75
In C++ standard library, there are many one-liner function templates. E.g. std::move is essentially just a cast, and an implementation may be:
template<typename _Tp>
constexpr typename std::remove_reference<_Tp>::type&&
move(_Tp&& __t) noexcept
{ return static_cast<typename std::remove_reference<_Tp>::type&&>(__t); }
I know that practically, no machine code will be generated from std::move as it's just a cast. My question is: Is there any guarantee in the standard saying that for functions like std::move or std::forward (which does nothing more than casting), they must always be inlined (so no machine code is generated)? In other words, is it possible for a (pedantic) compiler to treat them as normal functions (i.e., put the argument on the stack, and generate call and ret instructions)?
Upvotes: 0
Views: 193
Reputation: 126827
My question is: Is there any guarantee in the standard saying that for functions like
std::move
orstd::forward
(which does nothing more than casting), they must always be inlined (so no machine code is generated)?
No. The standard stops at describing the observable behavior of an abstract machine. Code generation is an implementation detail about which the standard knows nothing.
That being said, both std::forward
and std::move
are operations that really only affect the C++ type system, not the actual data, so I'd be extremely surprised to see any machine code generated for them in optimized builds.
On the other hand, in non-optimized builds, leaving the std::move
outlined (as pretty much any other function call) can be a good idea to ease debugging. You can easily test this (live on gcc.godbolt):
#include <utility>
struct Foo {
int i;
Foo(Foo &&other) :i(other.i) {};
};
Foo with_move(Foo f) {
return std::move(f);
}
In gcc with -O0 std::move
is generated as an actual function (that does nothing besides setting up/tearing down the stack frame and returning the pointer argument it received)
Foo::Foo(Foo&&):
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-8], rdi
mov QWORD PTR [rbp-16], rsi
mov rax, QWORD PTR [rbp-16]
mov edx, DWORD PTR [rax]
mov rax, QWORD PTR [rbp-8]
mov DWORD PTR [rax], edx
nop
pop rbp
ret
with_move(Foo):
push rbp
mov rbp, rsp
sub rsp, 16
mov QWORD PTR [rbp-8], rdi
mov QWORD PTR [rbp-16], rsi
mov rax, QWORD PTR [rbp-16]
mov rdi, rax
call std::remove_reference<Foo&>::type&& std::move<Foo&>(Foo&)
mov rdx, rax
mov rax, QWORD PTR [rbp-8]
mov rsi, rdx
mov rdi, rax
call Foo::Foo(Foo&&)
mov rax, QWORD PTR [rbp-8]
leave
ret
std::remove_reference<Foo&>::type&& std::move<Foo&>(Foo&):
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-8], rdi
mov rax, QWORD PTR [rbp-8]
pop rbp
ret
while even at -O1 everything gets inlined:
with_move(Foo):
mov rax, rdi
mov edx, DWORD PTR [rsi]
mov DWORD PTR [rdi], edx
ret
Upvotes: 2
Reputation: 26800
This is not from the standard but from Scott Meyer's Effective CPP. Some excerpts from
Item 30: Understand the ins and outs of inlining.
Compiler optimizations are typically designed for stretches of code that lack function calls, so when you inline a function, you may enable compilers to perform context- specific optimizations on the body of the function. Most compilers never perform such optimizations on “outlined” function calls.
...
On machines with limited memory, overzealous inlining can give rise to programs that are too big for the available space. Even with virtual memory, inline-induced code bloat can lead to additional paging, a reduced instruction cache hit rate, and the performance penalties that accompany these things.
...
On the other hand, if an inline function body is very short, the code generated for the function body may be smaller than the code generated for a function call. If that is the case, inlining the function may actually lead to smaller object code and a higher instruction cache hit rate!
...
Bear in mind that inline is a request to compilers, not a command. ...
Template instantiation is independent of inlining. If you’re writing a template and you believe that all the functions instantiated from the template should be inlined, declare the template inline;
...
But if you’re writing a template for functions that you have no reason to want inlined, avoid declaring the template inline (either explicitly or implicitly). Inlining has costs, and you don’t want to incur them without forethought.
...
... let’s finish the observation that inline is a request that compilers may ignore. Most compilers refuse to inline functions they deem too complicated (e.g., those that contain loops or are recursive), and all but the most trivial calls to virtual functions defy inlining.It all adds up to this: whether a given inline function is actually inlined depends on the build environment you’re using — primarily on the compiler. Fortunately, most compilers have a diagnostic level that will result in a warning if they fail to inline a function you’ve asked them to.
Sometimes compilers generate a function body for an inline function even when they are perfectly willing to inline the function. For example, if your program takes the address of an inline function, compilers must typically generate an outlined function body for it.
If you can get hold of the book, read the whole item and it will clear up many of your doubts.
Upvotes: 0