Reputation: 1801

Low level C/C++ performance?

UPDATE: I just managed to beat my own 32 if code:

void test(char *file_char, unsigned int size)
{
    char* file_ = file_char;
    char* size_x = file_char+size;
    char to_find = 0;

    for(unsigned int i = 0; i < 10000; i++)
    {
        file_char = file_;

        while(*file_char++ != to_find);//skip all characters till we find a 0

        if(*file_char)//some if in order to avoid compiler removing our test code
            cout << "found";
    }
}

The above code requires the 0 to appear at least once in the array otherwise an error will appear BUT it's a bit faster than the if code and MUCH more compact.

Is there any way to make the above code faster? (having a char array and trying to find the position a char appears)?

I wrote some code and I am really puzzled.

init:

int main()
{
    FILE *file;
    file = fopen("C:\\data.txt", "rb");

    static const int size = 60000;

    char *file_char = (char*)malloc(size);

    unsigned int i = 0;
    while(i < size)
        fread(&file_char[i++], 1, 1, file);

    clock_t clock_ = clock();
    test(file_char, size);
    std::cout << ((double)clock()-clock_)/1000;
    return 0;
}

The code bellow takes 3.5 seconds to execute:

void test(char *file_char, unsigned int size)
{
    for(unsigned int i = 0; i < 100000; i++)
    {
        unsigned int pos = 0;
        char to_find = 0;
        while(pos < size)
            if(file_char[pos++] == to_find)
                std::cout << "found";
    }
}

BUT the code bellow takes 1.8 seconds, HALF the time!

void test(char *file_char, unsigned int size)
{
    for(unsigned int i = 0; i < 100000; i++)
    {
        unsigned int pos = 0;
        char to_find = 0;
        while(pos < size)
        {
            if(file_char[pos] == to_find)
                std::cout << "found";
            else if(file_char[pos+1] == to_find)
                std::cout << "found";
            else if(file_char[pos+2] == to_find)
                std::cout << "found";
            else if(file_char[pos+3] == to_find)
                std::cout << "found";
            else if(file_char[pos+4] == to_find)
                std::cout << "found";
            else if(file_char[pos+5] == to_find)
                std::cout << "found";
            else if(file_char[pos+6] == to_find)
                std::cout << "found";
            else if(file_char[pos+7] == to_find)
                std::cout << "found";
            else if(file_char[pos+8] == to_find)
                std::cout << "found";
            else if(file_char[pos+9] == to_find)
                std::cout << "found";
            else if(file_char[pos+10] == to_find)
                std::cout << "found";
            else if(file_char[pos+11] == to_find)
                std::cout << "found";
            else if(file_char[pos+12] == to_find)
                std::cout << "found";
            else if(file_char[pos+13] == to_find)
                std::cout << "found";
            else if(file_char[pos+14] == to_find)
                std::cout << "found";
            else if(file_char[pos+15] == to_find)
                std::cout << "found";
            else if(file_char[pos+16] == to_find)
                std::cout << "found";
            else if(file_char[pos+17] == to_find)
                std::cout << "found";
            else if(file_char[pos+18] == to_find)
                std::cout << "found";
            else if(file_char[pos+19] == to_find)
                std::cout << "found";
            else if(file_char[pos+20] == to_find)
                std::cout << "found";
            else if(file_char[pos+21] == to_find)
                std::cout << "found";
            else if(file_char[pos+22] == to_find)
                std::cout << "found";
            else if(file_char[pos+23] == to_find)
                std::cout << "found";
            else if(file_char[pos+24] == to_find)
                std::cout << "found";
            else if(file_char[pos+25] == to_find)
                std::cout << "found";
            else if(file_char[pos+26] == to_find)
                std::cout << "found";
            else if(file_char[pos+27] == to_find)
                std::cout << "found";
            else if(file_char[pos+28] == to_find)
                std::cout << "found";
            else if(file_char[pos+29] == to_find)
                std::cout << "found";
            else if(file_char[pos+30] == to_find)
                std::cout << "found";
            else if(file_char[pos+31] == to_find)
                std::cout << "found";

            pos+=32;
        }
    }
}

I am using Visual Studio 2012 x64 and the program never couts anything, because no char is 0. How can this be explained? How to archive the same performance without using 32 ifs?

edit 1: If I create 64 ifs, there is NO speed increase over the 32 ifs version.

edit 2: If I remove the else and leave the ifs the program takes 4 seconds.

Now, how can the above unreasonable results be explained?

Upvotes: 5

Answers (5)

manlio

Reputation: 18972

I've made some test just to be sure.

With g++ (under Linux and Windows) I've the same results you have with Visual Studio:

Version 1 (no explicit loop unrolling)

g++ -O3 7.5s

Version 2 (explicit loop unrolling)

g++ -O3 2.1s

but with the -funroll-loops option turned on (usually this optimization is not enabled by default because it may or may not make it run faster):

Version 1 (no explicit loop unrolling)

g++ -O3 -funroll-loops 2.2s

Version 2 (explicit loop unrolling)

g++ -O3 -funroll-loops 2.2s

So this is related to loop unrolling.

EDIT

You could change your last example to explicitly insert a sentry, something like:

int main()
{
  static const int size = 60000;

  char *file_char = (char*)malloc(size+1);  // The last element is the sentry

  // ...Fill file_char[]...

  file_char[size] = 0;  // the sentry

  // ...
}

so test function won't fail (of course you have to check if you found the sentry or a "good" zero, but it's just one if).

Version 3 (sentry)

g++ -O3 0.68s

g++ -O3 -funroll-loops 0.72s

Upvotes: 4

jester

Reputation: 3489

This is an optimization technique usually applied by some optimizing compilers known as loop unrolling optimization. In the first code for loop has to run 10,000 iterations whereas in the second, the number of iterations is reduced to floor(10,000/32). Over the course of many iterations, the end of the for loop which is compiled as a jump instruction to the start of the loop, (unconditional jumps are quite expensive in machine code since it may cause flushing of instruction buffers in the CPU) is executed much less often and so is the loop test and loop counter update instruction.Over a considerable number of iterations this represents a significant improvement in execution time. Also although the nummber of of else tests within the loop increased considerably, they will be compiled into a jump table of similar to :

if( condition1 ) goto found

if( condition2 ) goto found

...

found:

which is gives significantly better perfomance.

Upvotes: 1

Irisciences

Reputation: 308

I think the two codes are differents.

In the first one you check the 'if' comparison each time.

In the second one, if the first is good, you skip all the following ones! (because of the else) So your saving a lot of comparisons (but missing checks).

For having the same code, you have to delete all the 'else'.

Upvotes: 5

Grady Player

Reputation: 14539

well in your second example once a match is made, it skips the rest of the comparisons... if you can guarantee that yo will only have one to_find per 32 indices then that would be viable... but you could also just re-write (could have a 1 off error):

void test(char *file_char, unsigned int size)
{
    for(unsigned int i = 0; i < 100000; i++)
    {
        unsigned int pos = 0;
        char to_find = 0;
        int skip = 32;
        while(pos < size)
        {
            if(file_char[pos++] == to_find)
            {
                std::cout << "found";
                pos+=skip;
            }
            skip--;
            if (!skip)
            {skip = 32;}
        }
    }
}

Upvotes: 3

Zebra North

Reputation: 11492

Your loop basically consists of two comparisons: pos < size and file_char[pos] == to_find. By unrolling the loop, you're reducing the number of comparisons from 2 * size to (size + size / 32).

Upvotes: 9

Low level C/C++ performance?

Answers (5)

Related Questions