Reputation: 972
I have a hot and critical path function (about 45% of cycles:ppp
as per perf record
) in my C++17 application that is not being inlined as I would expect. It's a tiny function -- it simply returns the value of an atomic pointer member. The disassembly confirms that the function is just four assembly instructions, including the retq
. Furthermore, there is only a single caller of this function in the entire build. I've even declared this function as __attribute__((always_inline))
. Yet, there's a call and return to this function being generated.
The caller is in file A and the callee is in file B.
Some additional notes:
-O3
and -march=native
const
and doesn't access any static members-flto
when linkingconst
and not templates, all with a dozen or fewer x86 assembly instructionsActually, I've simplified a bit -- there are actually two places where this lack of inlining is happening in my application. File B has a function F1, which calls File A's F2, which calls File B's F3 (F2 and F3 are the ones listed above).
File A:
F2() {
F3();
}
File B:
F1() {
F2();
}
F3() {}
How can I get all of these to inline into one function? Another more fundamental question: can a function defined in a different file be inlined (perhaps using LTO)?
Upvotes: 0
Views: 1160
Reputation: 120
"Inline functions are defined in the header because, in order to inline a function call, the compiler must be able to see the function body. For a naive compiler to do that, the function body must be in the same translation unit as the call." A translation unit means a source file together with it headers that compiles to one compilation unit. So in this case you could try to declare the function in the same source file or the in the included header.
Upvotes: 0
Reputation:
Some compilers (GCC, ICC, but not Clang) will never inline functions with public visibility on ELF targets when building shared object (-fPIC
flag). This due to possibility of the function to be replaced by a new one in the main executable.
If you want them to be inlined you can try the following:
inline
.-fno-semantic-interposition
flag which disables this behavior.-fvisibility=protected
or __attribute__((visibility("protected")))
. This doesn't always work because this might make compiler generate relocation that cannot be handled by the linker.-fvisibility=hidden
) visibility if those functions do not need to be visible outside of your shared object.GCC documentation on -fsemantic-interposition
should make some things clear:
Some object formats, like ELF, allow interposing of symbols by the dynamic linker. This means that for symbols exported from the DSO, the compiler cannot perform interprocedural propagation, inlining and other optimizations in anticipation that the function or variable in question may change. While this feature is useful, for example, to rewrite memory allocation functions by a debugging implementation, it is expensive in the terms of code quality.
Note the part about side effect which explain how -fno-semantic-interposition
offers compiler similar guarantees as inline
:
With -fno-semantic-interposition the compiler assumes that if interposition happens for functions the overwriting function will have precisely the same semantics (and side effects). Similarly if interposition happens for variables, the constructor of the variable will be the same. The flag has no effect for functions explicitly declared inline (where it is never allowed for interposition to change semantics) and for symbols explicitly declared weak.
You can also check my answer here.
Upvotes: 1
Reputation: 264729
PS
The always_inline
attribute probably does not mean what you think it means. Normally g++ does not inline anything when there are no optimizations turned on (as this makes debugging easier, I assume). By adding this attribute (always_inline
) the compiler will inline when not optimizing (probably not what you want) but this does not make a function that was not inline(able) into one that can or will be inline(ed).
see: https://gcc.gnu.org/onlinedocs/gcc/Inline.html
Given your comments you have the following:
File A.h
void F2();
File B.h
void F1();
void F3() __attribute__((always_inline));
File A.cpp
#include "A.h"
#include "B.h"
void F2() {
F3();
}
File B.cpp
#include "B.h"
#include "A.h"
void F1() {
F2();
}
void F3() {}
In the future that would be the minimal viable applications that you should have submitted as it has all the type information and enough to re-build your situation.
The code you provide is not compilable and takes a lot of cognitive load to unwind the english description you provided into compilable code.
If you have set up your compiler this can be done so that F3()
will be inlined into A.cpp
but that may not always be the case. To be able to do that kind of optimization either the translation unit must have access to the source of F3()
or you must be able to cross translation unit optimizations.
You can simplify this by moving the body of F3()
into the header file. Then it will be available for inlining directly to the translation unit.
File A.h
void F2();
File B.h
void F1();
void F3() __attribute__((always_inline)); // I would not add this.
// Let the compiler not inline in debug mode.
inline void F3() {}
File A.cpp
#include "A.h"
#include "B.h"
void F2() {
F3();
}
File B.cpp
#include "B.h"
#include "A.h"
void F1() {
F2();
}
Upvotes: 3
Reputation: 85541
The standard way to have inline
functions that are actually inlined is to define them in the same translation unit (for example, in the header file), and use the inline
specifier.
There are no provisions in Standard C++ for inlining functions across translation units, but sometimes it can be done by a compiler as part of an LTO (IPO, WPO) extension.
ICC calls it Interprocedural Optimization (IPO) and the compile flag you're looking for is -ipo
.
See also Using IPO.
Note: there is also -inline-level=2
but that is already set from -O2
onwards.
Upvotes: 2