Reputation: 329
Question: Why does performance of functions differ when I compile them separately and link?
First off, the CODE
randoms.hpp
int XORShift();
int GameRand();
randoms.cpp
static unsigned int x = 123456789;
static unsigned int y = 362436069;
static unsigned int z = 521288629;
static unsigned int w = 88675123;
int XORShift()
{
unsigned int t = x ^ (x << 11);
x = y;
y = z;
z = w;
return w = w ^ (w >> 19) ^ (t ^ (t >> 8));
}
static unsigned int high = 0xDEADBEEF;
static unsigned int low = high ^ 0x49616E42;
int GameRand()
{
high = (high << 16) + (high >> 16);
high += low;
low += high;
return high;
}
main.cpp
#include <iostream>
#include <windows.h>
#include "randoms.hpp"
using namespace std;
//Windows specific performance tracking
long long milliseconds_now() {
static LARGE_INTEGER s_frequency;
static BOOL s_use_qpc = QueryPerformanceFrequency(&s_frequency);
LARGE_INTEGER now;
QueryPerformanceCounter(&now);
return (1000LL * now.QuadPart) / s_frequency.QuadPart;
}
void main() {
const int numCalls = 100000000; //100 mil
{
cout << "XORShift..." << endl;
long long start = milliseconds_now();
for(int i=0; i<numCalls; i++)
XORShift();
long long elapsed = milliseconds_now() - start;
cout << "\tms: " << elapsed << endl;
}
{
cout << "GameRand..." << endl;
long long start = milliseconds_now();
for(int i=0; i<numCalls; i++)
GameRand();
long long elapsed = milliseconds_now() - start;
cout << "\tms: " << elapsed << endl;
}
{
cout << "std::rand..." << endl;
long long start = milliseconds_now();
for(int i=0; i<numCalls; i++)
std::rand();
long long elapsed = milliseconds_now() - start;
cout << "\tms: " << elapsed << endl;
}
}
Details
I am using C++ and Microsofts "cl" compiler. I am testing the performance of 3 pseudo-random functions. They are XORShift, GameRand, and std::rand().
Building main.cpp and randoms.cpp separately and linking with the command
cl /O2 /Oi main.cpp randoms.cpp
yields the following performance results:
XORShift...
ms: 520
GameRand...
ms: 2056
std::rand...
ms: 3800
However if I forget the header and include the functions directly via
#include "randoms.cpp"
and compile without any linking
cl /O2 /Oi main.cpp
I get very different performance:
XORShift...
ms: 234
GameRand...
ms: 135
std::rand...
ms: 3823
Both XORShift and GameRand get dramatic speed ups. It's very strange that GameRand goes from slower than XORShift to faster. How can I get the speed of the 2cd test, but still compile random.cpp separately and link?
** EDIT **:
Issue resolved thanks to the comment from @sehe and answers from @Oswald and @Tomasz Kłak. I am now compiling with the command
cl /O2 /Oi /GL main.cpp randoms.cpp
The /GL flag performs link time optimization. I can compile the files separately and still get the inlining.
Upvotes: 2
Views: 128
Reputation: 392954
Two things come to mind.
Firstly, inlining might be impacted (by having the bodies unavailable when compiling the call-site TUs, the compiler can't inline the code). In modern C++ inlining is a huge potential for optimization (since it will frequently inline several levels of calls and the resulting body frequently gives rise to even more interesting optimizations).
Many compilers nowadays have a Link Time Optimzation flag that let's you have your cake, and eat it too. This could benefit your situation
Upvotes: 1
Reputation: 2031
It's because of inlining. Since while compiling main.cpp
, compiler sees function definitions it can inline them at call sites instead of generating code for actual function call - you save on call frames.
Upvotes: 1
Reputation: 31647
If the function is used in the same translation unit in which it is defined, that usage can be inlined, thereby eliminating the overhead of a function call.
Upvotes: 1