James Leonard
James Leonard

Reputation: 3753

Few questions about C++ inline functions

The training materials from the class I took seem to be making two conflicting statements.

On one hand:

"Use of inline functions usually results in faster execution"

On the other hand:

"Use of inline functions may decrease performance due to more frequent swapping"

Question 1: Are both statements true?

Question 2: What is meant by "swapping" here?

Please glance at this snippet:

int powA(int a, int b) {
  return (a + b)*(a + b) ;
}

inline int powB(int a, int b) {
  return (a + b)*(a + b) ;
}

int main () {
    Timer *t = new Timer;

    for(int a = 0; a < 9000; ++a) {
        for(int b = 0; b < 9000; ++b) {
             int i = (a + b)*(a + b);       //              322 ms   <-----
            //  int i = powA(a, b);         // not inline : 450 ms
            //  int i = powB(a, b);         // inline :     469 ms
        }
    }

    double d = t->ms();
    cout << "-->  " << d << endl; 

    return 0;
}

Question 3: Why is performance so similar between powA and powB? I would have expected powB performance to be along 322ms, since it is, after all, inline.

Upvotes: 5

Views: 534

Answers (6)

Ken Bloom
Ken Bloom

Reputation: 58770

When an ordinary function is compiled, it's machine code is compiled once and put in one place separate from the other functions that call it. When executing the code, the processor has to jump to the place where code is stored, and this jump instruction takes extra time to load the function from memory. Sometimes, several jumps (or several loads and a jump) are needed to call a function, e.g. virtual functions. There is also time that is spent saving and restoring registers, and creating a stack frame, none of which is really necessary for sufficiently small inline functions.

When an inline function is compiled, all of its machine code is inserted directly into the place where it is called, so the time for the jump instruction is eliminated. The compiler also optimizes the code of the inline function based on its surroundings (e.g. register assignment can consider both the variables used outside the function and inside the function to minimize the number of registers that need to be saved). However, the inline function's code may appear in multiple places in the calling function (if it was called multiple times in the calling code), so on the whole it makes your codebase bigger. This can cause your code to grow large enough that it no longer fits in the CPU cache, in which case the processor has to go to main memory to fetch your code, and this takes longer than getting everything from cache. In some circumstances, this can offset the savings from eliminating the jump instruction, and make your code slower than if you had inlined the code.

"Swapping" usually refers to the behavior of virtual memory, which has the same kinds of tradeoffs as the CPU cache, but the time it takes to load code from disk is much longer, and the amount of memory your program has to fill for this to come into play is much larger. You're unlikely to ever see inline functions affect virtual memory performance.

Obviously both effects don't happen at once but it's difficult to know which will apply in any given circumstance.

Upvotes: 1

Sergiy Migdalskiy
Sergiy Migdalskiy

Reputation: 1096

inline inserts the code at the call site, saving on creation of stack frame, saving/restoring registers and a call (branch). In other words, using inline (when it works) is similar to writing the code for inlined function in place of its call.

However, inline isn't guaranteed to do anything and is compiler-dependent. The compiler will sometimes inline functions that aren't inline (well, it's probably the linker that does that when link-time optimization is turned on, but it's easy to imagine situations when it can be done on compiler level - e.g. when the inlined function is static).

If you want to force MSVC to inline functions, use __forceinline and check the assembly. There should be no calls - your code should compile to simple sequence of instructions executed linearly.

Regarding the speed: you can indeed make your code faster by inlining small functions. When you inline large functions however (and "Large" is hard to define, you need to run tests to determine what's large and what's not), your code size becomes bigger. That's because the code of the inlined function is repeated over and over again at the call sites. After all, the whole point of having a call to a function is to save the instruction count by reusing the same subroutine from multiple places in code.

When the code size becomes larger, the instruction caches may be overwhelmed, leading to slower code execution.

Another point to consider: modern out-of-order CPUs (Most desktop CPUs - e.g. Intel Core Duo or i7) have a mechanism (instruction trace) to prefetch branches ahead and "inline" then at hardware level. So aggressive inlining doesn't always make sense.

In your example, you need to see the assembly that your compiler generates. It may be the same for the inline and non-inline versions. If it doesn't inline, try __forceinline if it's MSVC that you're using. If the timing is the same in both cases, it means your CPU does a good job at prefetching instructions and the execution time bottleneck is elsewhere.

Upvotes: 4

chmeee
chmeee

Reputation: 3638

Both statements are true, sort of. Declaring a function inline is an indicator to the compiler to inline if able. The compiler will (usually) use its own judgment on whether or not to actually inline, but in C++ declaring it inline does change the code generation, at least for symbol generation.

"Swapping" in this context refers to paging the executable image to disk. Since the executable is larger, it may be affect performance in memory constrained systems.

Answering your third question, the compiler chose the same behavior (my guess is non-inline) for both functions.

Upvotes: 1

MartyE
MartyE

Reputation: 646

The books statements are correct. In other words, when done properly, inline can improve performance and when done improperly can reduce performance.

It's best to only inline small functions. This will reduce the additional assembly calls to jump in memory. This is how performance is improved.

If you inline large functions, this can cause the memory paging to exceed the cache size, hence cause additional memory swapping. This is how performance is hindered.

Upvotes: 1

John3136
John3136

Reputation: 29265

Swapping is an OS term about swapping different pages of memory in and out of the running process. Basically the swap takes some time. The bigger your app is, the more swapping it may have.

When you inline a function, instead of jumping to a single subroutine, a copy of the whole function is dumped at the calling location. This makes your program bigger, and hence in theory can lead to more swapping.

Normally for very small methods (like your powA and powB) inlining should be ok and result in faster execution, but it is really just "in theory" - there are probably "bigger fish to fry" in terms of squeezing the last drop of performance out of your code.

Upvotes: 2

Greg Hewgill
Greg Hewgill

Reputation: 993085

Question 1

Yes, both statements can be true, in particular circumstances. Obviously they won't both be true at the same time.

Question 2

"Swapping" is likely a reference to OS paging behaviour, where pages are swapped out to disk when the memory pressure becomes high.

In practice, if your inline functions are small then you will usually notice a performance improvement due to eliminating the overhead of a function call and return. However, in very rare circumstances, you may cause code to grow such that it cannot completely reside inside the CPU cache (during a performance-critical tight loop), and you may experience decreased performance. However, if you're coding at that level then you probably should be coding directly in assembly language anyway.

Question 3

The inline modifier is a hint to the compiler that it might want to consider compiling the given function inline. It doesn't have to follow your directions, and the result may also depend on the given compiler options. You can always look at the generated assembly code to find out what it did.

Your benchmark may not even be doing what you want because your compiler might be smart enough to see that you're not even using the result of the function call that you assign into i, so it might not even bother to call your function. Again, look at the generated assembly code.

Upvotes: 5

Related Questions