Reputation: 400
Using gcc v4.8.1
If I do:
//func.hpp
#ifndef FUNC_HPP
#define FUNC_HPP
int func(int);
#endif
//func.cpp
#include "func.hpp"
int func(int x){
return 5*x+7;
}
//main.cpp
#include <iostream>
#include "func.hpp"
using std::cout;
using std::endl;
int main(){
cout<<func(5)<<endl;
return 0;
}
Even the simple function func
will not get inlined. No combination of inline
, extern
, static
, and __attribute__((always_inline))
on the prototype and/or the definition changes this (obviously some combinations of these specifiers cause it to not even compile and/or produce warnings, not talking about those). I'm using g++ *.cpp -O3 -o run
and g++ *.cpp -O3 -S
for assembly output. When I look at the assembly output, I still see call func
. It appears only way I can get the function to be properly inlined is to have the prototype (probably not necessary) and the definition of the function in the header file. If the header is only included by one file in the whole program (included by only main.cpp
for example) it will compile and the function will be properly inlined without even needing the inline
specifier. If the header is to be included by multiple files, the inline
specifier appears to be needed to resolve multiple definition errors, and that appears to be its only purpose. The function is of course inlined properly.
So my question is: am I doing something wrong? Am I missing something? Whatever happened to:
"The compiler is smarter than you. It knows when a function should be inlined better than you do. And never ever use C arrays. Always use std::vector!"
-Every other StackOverflow user
Really? So calling func(5) and printing the result is faster than just printing 32? I will blindly follow you off the edge of a cliff all mighty all knowing and all wise gcc.
For the record, the above code is just an example. I am writing a ray tracer and when I moved all of the code of my math and other utility classes to their header files and used the inline
specifier, I saw massive performance gains. Literally like 10 times faster for some scenes.
Upvotes: 6
Views: 1378
Reputation: 1
Recent GCC is able to inline across compilation units through link-time optimizations (LTO). You need to compile - and link - with -flto
; see Link-time optimization and inline and GCC optimize options.
(Actually, LTO is done by a special variant lto1
of the compiler at link time; LTO works by serializing, inside the object files, some internal representations of GCC, which are also used by lto1
; so what happens with -flto
is that when compiling a src1.c
with it the generated src1.o
contains the GIMPLE representations in addition of the object binary; and when linking with gcc -flto src*.o
the lto1
"front-end" is extracting that GIMPLE representations from inside the src*.o
and almost recompiling all again...)
You need to explicitly pass -flto
both at compile time AND at link time (see this). If using a Makefile
you could try make CC='gcc -flto'
; otherwise, compile each translation unit with e.g. gcc -Wall -flto -O2 -c src1.c
(and likewise for src2.c
etc...) and link all of your program (or library) with gcc -Wall -flto -O2 src1.o src2.o -o prog -lsomelib
Notice that -flto
will significantly slow down your build (it is not passed by -O3
so you need to use it explicitly, and you need to link with it also). Often you get a 5% or 10% improvement of performance -of the built program- at the expense of nearly doubling the build time. Sometimes you can get more improvements.
Upvotes: 10
Reputation: 544
The inline keyword is nothing more than a suggestion to the compiler, "i want this function to be inlined". It can ignore this keyword, without even a warning.
In order for your function func(...) to be inlined, your compiler/linker HAVE TO support some form of link-time code generation(and optimizaton). Because func() and main() lie in different code units, the C++ compiler can't see them both at the same time, and therefore can't inline one function within the other. It NEEDS the LINKER SUPPORT to do so.
Consult your build tool manuals on how to switch link time code gen features on, if they are supported at all.
Upvotes: 1
Reputation: 409442
The compiler can't inline what it doesn't have. It needs the full body of the function to inline its code.
You have to remember that the compiler only works on one source file at a time (more precisely, one translation unit at a time), and have no idea about other source files and whats in them.
The linker might be able to do it though, as it sees all the code, and some linkers have flags that allows some link-time optimizations.
Upvotes: 3