mike
mike

Reputation: 329

Why does performance of functions differ when I compile them separately and link?

Question: Why does performance of functions differ when I compile them separately and link?

First off, the CODE
randoms.hpp

int XORShift();
int GameRand();

randoms.cpp

static unsigned int x = 123456789;
static unsigned int y = 362436069;
static unsigned int z = 521288629;
static unsigned int w = 88675123;
int XORShift()
{
    unsigned int t = x ^ (x << 11);
    x = y;
    y = z;
    z = w;
    return w = w ^ (w >> 19) ^ (t ^ (t >> 8));
}

static unsigned int high = 0xDEADBEEF;
static unsigned int low = high ^ 0x49616E42;
int GameRand()
{
    high = (high << 16) + (high >> 16);
    high += low;
    low += high;
    return high;
}

main.cpp

#include <iostream>
#include <windows.h>
#include "randoms.hpp"
using namespace std;

//Windows specific performance tracking
long long milliseconds_now() { 
    static LARGE_INTEGER s_frequency;
    static BOOL s_use_qpc = QueryPerformanceFrequency(&s_frequency);
    LARGE_INTEGER now;
    QueryPerformanceCounter(&now);
    return (1000LL * now.QuadPart) / s_frequency.QuadPart;
}

void main() {
    const int numCalls = 100000000; //100 mil
    {
        cout << "XORShift..." << endl;
        long long start = milliseconds_now();
        for(int i=0; i<numCalls; i++)
            XORShift();
        long long elapsed = milliseconds_now() - start;
        cout << "\tms: " << elapsed << endl;
    }
    {
        cout << "GameRand..." << endl;
        long long start = milliseconds_now();
        for(int i=0; i<numCalls; i++)
            GameRand();
        long long elapsed = milliseconds_now() - start;
        cout << "\tms: " << elapsed << endl;
    }
    {
        cout << "std::rand..." << endl;
        long long start = milliseconds_now();
        for(int i=0; i<numCalls; i++)
            std::rand();
        long long elapsed = milliseconds_now() - start;
        cout << "\tms: " << elapsed << endl;
    }
}

Details
I am using C++ and Microsofts "cl" compiler. I am testing the performance of 3 pseudo-random functions. They are XORShift, GameRand, and std::rand().

Building main.cpp and randoms.cpp separately and linking with the command

cl /O2 /Oi main.cpp randoms.cpp

yields the following performance results:

XORShift...
    ms: 520
GameRand...
    ms: 2056
std::rand...
    ms: 3800

However if I forget the header and include the functions directly via

#include "randoms.cpp"

and compile without any linking

cl /O2 /Oi main.cpp

I get very different performance:

XORShift...
    ms: 234
GameRand...
    ms: 135
std::rand...
    ms: 3823

Both XORShift and GameRand get dramatic speed ups. It's very strange that GameRand goes from slower than XORShift to faster. How can I get the speed of the 2cd test, but still compile random.cpp separately and link?

** EDIT **:
Issue resolved thanks to the comment from @sehe and answers from @Oswald and @Tomasz Kłak. I am now compiling with the command

cl /O2 /Oi /GL main.cpp randoms.cpp

The /GL flag performs link time optimization. I can compile the files separately and still get the inlining.

Upvotes: 2

Views: 128

Answers (3)

sehe
sehe

Reputation: 392954

Two things come to mind.

Firstly, inlining might be impacted (by having the bodies unavailable when compiling the call-site TUs, the compiler can't inline the code). In modern C++ inlining is a huge potential for optimization (since it will frequently inline several levels of calls and the resulting body frequently gives rise to even more interesting optimizations).

Many compilers nowadays have a Link Time Optimzation flag that let's you have your cake, and eat it too. This could benefit your situation

  • at the cost of increased link time
  • as long as the objects statically linked contain the relevant definitions (i.e.: not with dynamic linking etc.)

Upvotes: 1

tumdum
tumdum

Reputation: 2031

It's because of inlining. Since while compiling main.cpp, compiler sees function definitions it can inline them at call sites instead of generating code for actual function call - you save on call frames.

Upvotes: 1

Oswald
Oswald

Reputation: 31647

If the function is used in the same translation unit in which it is defined, that usage can be inlined, thereby eliminating the overhead of a function call.

Upvotes: 1

Related Questions