Strange benchmark results

Question

I wrote the following benchmark:

#include  // cout
#include    // pow
#include    // high_resolution_clock    

using namespace std;
using namespace std::chrono;

int64_t calculate(int);

int main()
{
    high_resolution_clock::time_point t1, t2;

    // Test 1
    t1 = high_resolution_clock::now();
    calculate(200);
    t2 = high_resolution_clock::now();

    cout << "RUNTIME = " <<  duration_cast(t2 - t1).count() << " nano seconds" << endl;

    // Test 2   
    t1 = high_resolution_clock::now();
    calculate(200000);
    t2 = high_resolution_clock::now();

    cout << "RUNTIME = " <<  duration_cast(t2 - t1).count() << " nano seconds" << endl;
}

int64_t calculate(const int max_exponent)
{
    int64_t num = 0;

    for(int i = 0; i < max_exponent; i++)
    {
        num += pow(2, i);
    }

    return num;
}

When running this benchmark on the Odroid XU3 the following output is produced (8 runs):

RUNTIME TEST 1 = 1250 nano seconds
RUNTIME TEST 2 = 1041 nano seconds

RUNTIME TEST 1 = 1292 nano seconds
RUNTIME TEST 2 = 1042 nano seconds

RUNTIME TEST 1 = 1250 nano seconds
RUNTIME TEST 2 = 1083 nano seconds

RUNTIME TEST 1 = 1292 nano seconds
RUNTIME TEST 2 = 1083 nano seconds

RUNTIME TEST 1 = 1209 nano seconds
RUNTIME TEST 2 = 1084 nano seconds

RUNTIME TEST 1 = 1166 nano seconds
RUNTIME TEST 2 = 1083 nano seconds

RUNTIME TEST 1 = 1292 nano seconds
RUNTIME TEST 2 = 1042 nano seconds

RUNTIME TEST 1 = 1166 nano seconds
RUNTIME TEST 2 = 1250 nano seconds

RUNTIME TEST 1 = 1250 nano seconds
RUNTIME TEST 2 = 1250 nano seconds

The second exponent is 1000 times greater the the first one. Why does the second call finish faster sometimes?

I used GCC (4.8) as Compiler with the -Ofast flag.

Update: I could reproduce similar behaviour on my i7 4770k.

Jerry Coffin · Accepted Answer

The short answer is "dead code elimination". The compiler sees that you never use the result from calling the function (and the function has no side effects), so it just eliminates calling the function.

Print out the result from the function, and things change a bit. E.g.:

Ignore: -9223372036854775808    RUNTIME = 0 nano seconds
Ignore: -9223372036854775808    RUNTIME = 23001300 nano seconds

Modified code, in case you care:

#include  // cout
#include    // pow
#include    // high_resolution_clock    

using namespace std;
using namespace std::chrono;

int64_t calculate(int);

int main() {
    high_resolution_clock::time_point t1, t2;

    // Test 1
    t1 = high_resolution_clock::now();
    auto a = calculate(200);
    t2 = high_resolution_clock::now();
    std::cout << "Ignore: " << a << "	";

    cout << "RUNTIME = " << duration_cast(t2 - t1).count() << " nano seconds" << endl;

    // Test 2   
    t1 = high_resolution_clock::now();
    auto b = calculate(200000);
    t2 = high_resolution_clock::now();
    std::cout << "Ignore: " << b << "	";

    cout << "RUNTIME = " << duration_cast(t2 - t1).count() << " nano seconds" << endl;
}

int64_t calculate(const int max_exponent) {
    int64_t num = 0;

    for (int i = 0; i < max_exponent; i++) {
        num += pow(2, i);
    }

    return num;
}

From there you have the minor detail that you're overflowing the range of an int64_t (many times over) giving undefined behavior--but at least with this there's reasonable hope that the times printed out reflect the time to carry out the specified calculations.

Strange benchmark results

Answers (2)

Related Questions