Bhanuday Sharma
Bhanuday Sharma

Reputation: 433

Strange performance of C code: more instructions executes faster than fewer ones

I run following two codes in C using GCC compiler.

#include <stdio.h>
#include <time.h>

int
main()
{
    int i, j, k, l, step, count = 0;
    double duration;
    float multi;

    const clock_t begin_time = clock();

    for (step = 1; step < 10000; step++)
        for (k = 1; k < 27000; k++) {
            for (i = 1; i < 5; i++)
                for (j = 1; j < 9; j++) {
                    count++;            // INSTRUCTION-1
                }
        };

    duration = (double) (clock() - begin_time) / CLOCKS_PER_SEC;

    printf("C program count = %d  \n", count);
    printf("clock = %f \n", duration);
}

The second code is as follows:

#include <stdio.h>
#include <time.h>

int
main()
{

    int i, j, k, l, step, count = 0;
    double duration;
    float multi;

    const clock_t begin_time = clock();

    for (step = 1; step < 10000; step++)
        for (k = 1; k < 27000; k++) {
            for (i = 1; i < 5; i++)
                for (j = 1; j < 9; j++) {
                    count++;            // INSTRUCTION-1
                    multi = 9.56587458 * 8.547458748;   // INSTRUCTION-2
                }
        };

    duration = (double) (clock() - begin_time) / CLOCKS_PER_SEC;

    printf("C program count = %d  \n", count);
    printf("clock = %f \n", duration);
}

Both the codes are almost same. The only difference is that the first code has only one instruction inside the loop, whereas the second code has two instructions inside the loop. Therefore, I was expecting the second code should take longer to execute. However, to my surprise, The execution time of the first code was 22.45 seconds, whereas for the second code was 17.96 seconds. Why is the second code executed faster than the first code, even if it involves significantly more computations?

CPU used was Intel Xeon E5-2670V 2.5 GHz 2 CPU-IvyBridge (20-cores), if this information is relevant.

Upvotes: 2

Views: 134

Answers (2)

0___________
0___________

Reputation: 67855

If you compile with optimizations enabled (otherwise all performance tests do not have any sense ), the compiler will optimize all of your loops out and it will only assign the count with compile-time calculated value. Double multiplication will be optimized out (as you do not use the result at all) https://godbolt.org/z/9MebGc

But even if we change the program to force loops the expected difference will be very small (because loops will have much more instructions than the multiplication of two doubles). And it actually is:

void test(volatile double x, volatile double y)
{
    volatile unsigned  i, j, k, l, step, count = 0;
    double duration;
    double multi = 0.0;

    const clock_t begin_time = clock();

    for (step = 1; step < 1000; step++)
        for (k = 1; k < 27000; k++) {
            for (i = 1; i < 5; i++)
                for (j = 1; j < 9; j++) {
                    count++;            // INSTRUCTION-1
                    multi += x * y;   // INSTRUCTION-2
                }
        };

    duration = (double) (clock() - begin_time) / CLOCKS_PER_SEC;

    printf("C program count = %d  \n", count);
    printf("clock = %f \n", duration);
    printf("%f\n", multi);
}

void test2(volatile double x, volatile double y)
{
    volatile unsigned  i, j, k, l, step, count = 0;
    double duration;
    double multi;

    const clock_t begin_time = clock();

    for (step = 1; step < 1000; step++)
        for (k = 1; k < 27000; k++) {
            for (i = 1; i < 5; i++)
                for (j = 1; j < 9; j++) {
                    count++;            // INSTRUCTION-1
                }
        };

    duration = (double) (clock() - begin_time) / CLOCKS_PER_SEC;

    printf("C program count = %d  \n", count);
    printf("clock = %f \n", duration);
    printf("%f\n", multi);
}
 
int main()
{
    srand(time(NULL));
    test((double)rand()/rand(), (double)rand()/rand());
    test2((double)rand()/rand(), (double)rand()/rand());
}

and the result:

C program count = 863104032  
clock = 1.467764 
962658070.362825
C program count = 863104032  
clock = 1.413076 
0.000000

https://godbolt.org/z/hh4n18

Upvotes: 3

wrosecrans
wrosecrans

Reputation: 1085

If you look at the assembly output at https://godbolt.org/z/136qsc

This function:

double multiply(void) {
    return 9.56587458 * 8.547458748; ;
}

doesn't include a multiply instruction, or the constants 9.56587458 or 8.547458748. The compiler notices that the result can be calculated at compile time, so there's no reason to have that code in the output or to execute it at runtime. You aren't doing what you think you are doing with your two examples, so it makes sense that the one that adds no additional complexity isn't significantly slower.

Upvotes: 3

Related Questions