user1084699
user1084699

Reputation: 75

Inlineness guarantee of one-liner function templates c++

In C++ standard library, there are many one-liner function templates. E.g. std::move is essentially just a cast, and an implementation may be:

template<typename _Tp>
    constexpr typename std::remove_reference<_Tp>::type&&
    move(_Tp&& __t) noexcept
    { return static_cast<typename std::remove_reference<_Tp>::type&&>(__t); }

I know that practically, no machine code will be generated from std::move as it's just a cast. My question is: Is there any guarantee in the standard saying that for functions like std::move or std::forward (which does nothing more than casting), they must always be inlined (so no machine code is generated)? In other words, is it possible for a (pedantic) compiler to treat them as normal functions (i.e., put the argument on the stack, and generate call and ret instructions)?

Upvotes: 0

Views: 193

Answers (2)

Matteo Italia
Matteo Italia

Reputation: 126827

My question is: Is there any guarantee in the standard saying that for functions like std::move or std::forward (which does nothing more than casting), they must always be inlined (so no machine code is generated)?

No. The standard stops at describing the observable behavior of an abstract machine. Code generation is an implementation detail about which the standard knows nothing.

That being said, both std::forward and std::move are operations that really only affect the C++ type system, not the actual data, so I'd be extremely surprised to see any machine code generated for them in optimized builds.

On the other hand, in non-optimized builds, leaving the std::move outlined (as pretty much any other function call) can be a good idea to ease debugging. You can easily test this (live on gcc.godbolt):

#include <utility>

struct Foo {
    int i;
    Foo(Foo &&other) :i(other.i) {};
};

Foo with_move(Foo f) {
    return std::move(f);
}

In gcc with -O0 std::move is generated as an actual function (that does nothing besides setting up/tearing down the stack frame and returning the pointer argument it received)

Foo::Foo(Foo&&):
        push    rbp
        mov     rbp, rsp
        mov     QWORD PTR [rbp-8], rdi
        mov     QWORD PTR [rbp-16], rsi
        mov     rax, QWORD PTR [rbp-16]
        mov     edx, DWORD PTR [rax]
        mov     rax, QWORD PTR [rbp-8]
        mov     DWORD PTR [rax], edx
        nop
        pop     rbp
        ret
with_move(Foo):
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     QWORD PTR [rbp-8], rdi
        mov     QWORD PTR [rbp-16], rsi
        mov     rax, QWORD PTR [rbp-16]
        mov     rdi, rax
        call    std::remove_reference<Foo&>::type&& std::move<Foo&>(Foo&)
        mov     rdx, rax
        mov     rax, QWORD PTR [rbp-8]
        mov     rsi, rdx
        mov     rdi, rax
        call    Foo::Foo(Foo&&)
        mov     rax, QWORD PTR [rbp-8]
        leave
        ret
std::remove_reference<Foo&>::type&& std::move<Foo&>(Foo&):
        push    rbp
        mov     rbp, rsp
        mov     QWORD PTR [rbp-8], rdi
        mov     rax, QWORD PTR [rbp-8]
        pop     rbp
        ret

while even at -O1 everything gets inlined:

with_move(Foo):
        mov     rax, rdi
        mov     edx, DWORD PTR [rsi]
        mov     DWORD PTR [rdi], edx
        ret

Upvotes: 2

P.W
P.W

Reputation: 26800

This is not from the standard but from Scott Meyer's Effective CPP. Some excerpts from
Item 30: Understand the ins and outs of inlining.

Compiler optimizations are typically designed for stretches of code that lack function calls, so when you inline a function, you may enable compilers to perform context- specific optimizations on the body of the function. Most compilers never perform such optimizations on “outlined” function calls.
...
On machines with limited memory, overzealous inlining can give rise to programs that are too big for the available space. Even with virtual memory, inline-induced code bloat can lead to additional paging, a reduced instruction cache hit rate, and the performance penalties that accompany these things.
...
On the other hand, if an inline function body is very short, the code generated for the function body may be smaller than the code generated for a function call. If that is the case, inlining the function may actually lead to smaller object code and a higher instruction cache hit rate!
...
Bear in mind that inline is a request to compilers, not a command. ...
Template instantiation is independent of inlining. If you’re writing a template and you believe that all the functions instantiated from the template should be inlined, declare the template inline;
...
But if you’re writing a template for functions that you have no reason to want inlined, avoid declaring the template inline (either explicitly or implicitly). Inlining has costs, and you don’t want to incur them without forethought.
...
... let’s finish the observation that inline is a request that compilers may ignore. Most compilers refuse to inline functions they deem too complicated (e.g., those that contain loops or are recursive), and all but the most trivial calls to virtual functions defy inlining.

It all adds up to this: whether a given inline function is actually inlined depends on the build environment you’re using — primarily on the compiler. Fortunately, most compilers have a diagnostic level that will result in a warning if they fail to inline a function you’ve asked them to.

Sometimes compilers generate a function body for an inline function even when they are perfectly willing to inline the function. For example, if your program takes the address of an inline function, compilers must typically generate an outlined function body for it.

If you can get hold of the book, read the whole item and it will clear up many of your doubts.

Upvotes: 0

Related Questions