Dan Lincan
Dan Lincan

Reputation: 1065

g++ -O optimizations

I have the following code:

#include <iostream>
int main()
{
        int n = 100;
        long a = 0;
        int *x = new int[n];

        for(int i = 0; i < n; ++i)
                for(int j = 0; j < n; ++j)
                        for(int k = 0; k < n; ++k)
                                for(int l = 0; l < n; ++l)
                                {   

                                        a += x[(j + k + l) % 100];
                                }   
        std::cout << a << std::endl;
        delete[] x;
        return 0;
}

If I compile without optimizations g++ test.cc and then run time ./a.out it will display 0.7s. However when I compile it with -O, the time decreases by 2 times.

Compiler used

g++ (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2

My question

How can I rewrite the code so when compiling without -O I can obtain same time ( or close to it ) ? In other words, how to optimize nested loops manually ?

Why I ask

I have a similar code which runs about 4 times faster if I use -O optimization.

PS: I've looked in the manual of the compiler but there are way too many flags there to test which one really makes the difference.

Upvotes: 2

Views: 1198

Answers (5)

celtschk
celtschk

Reputation: 19731

Most of the things the compiler optimizes with -O are things on a level below C++. For example, all variables live in memory, including your loop variables. Therefore without optimization, the compiler will most likely on each iteration of the inner loop first read the loop variable to compare it with 0, inside the loop load it again in order to use it for the index, and then at the end of the loop will read the value again, increment it, and write it back. With optimization, it will notice that the loop variable is not changed in the loop body, and therefore does not need to get re-read from memory each time. Moreover it will also note that the address of the variable is never taken, so no other code will ever access it, and therefore writing it to memory can also be omitted. That is, the loop variable will live only in memory. This optimization alone will save three hundred million memory reads and one hundred million memory writes during execution of your function. But since things like processor registers and memory reads/writes are not exposed at the language level, there's no way to optimize it at the language level.

Furthermore, it doesn't make sense to hand-optimize things which the compiler optimizes anyway. Better spend your time at optimizing things which the compiler cannot optimize.

Upvotes: 3

Nicol Bolas
Nicol Bolas

Reputation: 474436

How can I rewrite the code so when compiling without -O I can obtain same time ( or close to it ) ? In other words, how to optimize nested loops manually ?

You write them in assembly. Good luck with that, BTW.

The purpose of the optimization switches in compilers is for you to say, "Compiler, I want you to try to generate fast assembly." By default, compilers do not do optimizations, because those optimizations might inhibit the debugability of the resulting code. So you have to specifically ask for them.

The things the compiler does to optimize code are, generally speaking, not something you can do manually. Some can be (like loop unrolling), but others can't be.

Upvotes: 1

Mark B
Mark B

Reputation: 96311

You're using a polynomial time algorithm (n ** 4) so it stands to reason it'll be slow, especially for larger n. Perhaps an algorithm with a lower complexity would help you?

If you want your code optimized you can either ask the compiler to optimize for you, or just write it in assembly language and be done with it. Don't try to second-guess the C++ compiler.

Upvotes: 1

Ben Jackson
Ben Jackson

Reputation: 93910

You could observe that your loops form a set of coefficients coeff[i] such that a single loop summing a[i] * coeff[i] produces the same total as the nested loop. If we assume your index was meant to be (i + j + k) % 100 then coeff[i] = n * n * n for all i, so your whole program simplifies to

long coeff = n * n * n;
for (int i = 0; i < n; ++n)
    a += x[i] * coeff;

which I'm sure runs in far less than 0.7 seconds.

Upvotes: 1

Karoly Horvath
Karoly Horvath

Reputation: 96326

This code has undefined behaviour, so actually the optimizer can do whatever it wants..

a += x[j + k + l % 100];

should be:

a += x[(j + k + l) % 100];

If you fix this I still don't get why you want to optimize something that actually doesn't do anything...

Personally I would optimize this to: :)

std::cout << 0 << std::endl;

Note: remove the loop for i and do std::cout << a * n << std::endl;

Upvotes: 3

Related Questions