Reputation: 3753
The training materials from the class I took seem to be making two conflicting statements.
On one hand:
"Use of inline functions usually results in faster execution"
On the other hand:
"Use of inline functions may decrease performance due to more frequent swapping"
Question 1: Are both statements true?
Question 2: What is meant by "swapping" here?
Please glance at this snippet:
int powA(int a, int b) {
return (a + b)*(a + b) ;
}
inline int powB(int a, int b) {
return (a + b)*(a + b) ;
}
int main () {
Timer *t = new Timer;
for(int a = 0; a < 9000; ++a) {
for(int b = 0; b < 9000; ++b) {
int i = (a + b)*(a + b); // 322 ms <-----
// int i = powA(a, b); // not inline : 450 ms
// int i = powB(a, b); // inline : 469 ms
}
}
double d = t->ms();
cout << "--> " << d << endl;
return 0;
}
Question 3: Why is performance so similar between powA
and powB
? I would have expected powB
performance to be along 322ms, since it is, after all, inline.
Upvotes: 5
Views: 534
Reputation: 58770
When an ordinary function is compiled, it's machine code is compiled once and put in one place separate from the other functions that call it. When executing the code, the processor has to jump to the place where code is stored, and this jump
instruction takes extra time to load the function from memory. Sometimes, several jumps (or several loads and a jump) are needed to call a function, e.g. virtual functions. There is also time that is spent saving and restoring registers, and creating a stack frame, none of which is really necessary for sufficiently small inline functions.
When an inline function is compiled, all of its machine code is inserted directly into the place where it is called, so the time for the jump
instruction is eliminated. The compiler also optimizes the code of the inline function based on its surroundings (e.g. register assignment can consider both the variables used outside the function and inside the function to minimize the number of registers that need to be saved). However, the inline function's code may appear in multiple places in the calling function (if it was called multiple times in the calling code), so on the whole it makes your codebase bigger. This can cause your code to grow large enough that it no longer fits in the CPU cache, in which case the processor has to go to main memory to fetch your code, and this takes longer than getting everything from cache. In some circumstances, this can offset the savings from eliminating the jump
instruction, and make your code slower than if you had inlined the code.
"Swapping" usually refers to the behavior of virtual memory, which has the same kinds of tradeoffs as the CPU cache, but the time it takes to load code from disk is much longer, and the amount of memory your program has to fill for this to come into play is much larger. You're unlikely to ever see inline functions affect virtual memory performance.
Obviously both effects don't happen at once but it's difficult to know which will apply in any given circumstance.
Upvotes: 1
Reputation: 1096
inline
inserts the code at the call site, saving on creation of stack frame, saving/restoring registers and a call (branch). In other words, using inline
(when it works) is similar to writing the code for inlined function in place of its call.
However, inline
isn't guaranteed to do anything and is compiler-dependent. The compiler will sometimes inline
functions that aren't inline (well, it's probably the linker that does that when link-time optimization is turned on, but it's easy to imagine situations when it can be done on compiler level - e.g. when the inlined function is static).
If you want to force MSVC to inline
functions, use __forceinline
and check the assembly. There should be no calls - your code should compile to simple sequence of instructions executed linearly.
Regarding the speed: you can indeed make your code faster by inlining small functions. When you inline
large functions however (and "Large" is hard to define, you need to run tests to determine what's large and what's not), your code size becomes bigger. That's because the code of the inlined function is repeated over and over again at the call sites. After all, the whole point of having a call to a function is to save the instruction count by reusing the same subroutine from multiple places in code.
When the code size becomes larger, the instruction caches may be overwhelmed, leading to slower code execution.
Another point to consider: modern out-of-order CPUs (Most desktop CPUs - e.g. Intel Core Duo or i7) have a mechanism (instruction trace) to prefetch branches ahead and "inline
" then at hardware level. So aggressive inlining doesn't always make sense.
In your example, you need to see the assembly that your compiler generates. It may be the same for the inline
and non-inline
versions. If it doesn't inline
, try __forceinline
if it's MSVC that you're using. If the timing is the same in both cases, it means your CPU does a good job at prefetching instructions and the execution time bottleneck is elsewhere.
Upvotes: 4
Reputation: 3638
Both statements are true, sort of. Declaring a function inline
is an indicator to the compiler to inline if able. The compiler will (usually) use its own judgment on whether or not to actually inline, but in C++ declaring it inline
does change the code generation, at least for symbol generation.
"Swapping" in this context refers to paging the executable image to disk. Since the executable is larger, it may be affect performance in memory constrained systems.
Answering your third question, the compiler chose the same behavior (my guess is non-inline) for both functions.
Upvotes: 1
Reputation: 646
The books statements are correct. In other words, when done properly, inline
can improve performance and when done improperly can reduce performance.
It's best to only inline small functions. This will reduce the additional assembly calls to jump in memory. This is how performance is improved.
If you inline
large functions, this can cause the memory paging to exceed the cache size, hence cause additional memory swapping. This is how performance is hindered.
Upvotes: 1
Reputation: 29265
Swapping is an OS term about swapping different pages of memory in and out of the running process. Basically the swap takes some time. The bigger your app is, the more swapping it may have.
When you inline a function, instead of jumping to a single subroutine, a copy of the whole function is dumped at the calling location. This makes your program bigger, and hence in theory can lead to more swapping.
Normally for very small methods (like your powA and powB) inlining should be ok and result in faster execution, but it is really just "in theory" - there are probably "bigger fish to fry" in terms of squeezing the last drop of performance out of your code.
Upvotes: 2
Reputation: 993085
Yes, both statements can be true, in particular circumstances. Obviously they won't both be true at the same time.
"Swapping" is likely a reference to OS paging behaviour, where pages are swapped out to disk when the memory pressure becomes high.
In practice, if your inline functions are small then you will usually notice a performance improvement due to eliminating the overhead of a function call and return. However, in very rare circumstances, you may cause code to grow such that it cannot completely reside inside the CPU cache (during a performance-critical tight loop), and you may experience decreased performance. However, if you're coding at that level then you probably should be coding directly in assembly language anyway.
The inline
modifier is a hint to the compiler that it might want to consider compiling the given function inline. It doesn't have to follow your directions, and the result may also depend on the given compiler options. You can always look at the generated assembly code to find out what it did.
Your benchmark may not even be doing what you want because your compiler might be smart enough to see that you're not even using the result of the function call that you assign into i
, so it might not even bother to call your function. Again, look at the generated assembly code.
Upvotes: 5