Reputation: 9518

Does GCC cache loop variables?

When I have a loop like:

for (int i = 0; i < SlowVariable; i++)
{
   //
}

I know that in VB6 the SlowVariable is accessed every iteration of the loop, making the following much more efficient:

int cnt = SlowVariable;

for (int i = 0; i < cnt; i++)
{
   //
}

Do I need the make the same optimizations in GCC? Or does it evaluate SlowVariable only once?

Upvotes: 2

Answers (6)

user1129665

Reputation:

This depends on how the compiler optimize, for example here:

#include <stdio.h>

int main(int argc, char **argv)
{
    unsigned int i;
    unsigned int z = 10;

    for( i = 0 ; i < z ; i++ )
        printf("%d\n", i);

    return 0;
}

If you compiled it using gcc example.c -o example, the result code will be:

 0x0040138c <+0>:     push   ebp
 0x0040138d <+1>:     mov    ebp,esp
 0x0040138f <+3>:     and    esp,0xfffffff0
 0x00401392 <+6>:     sub    esp,0x20
 0x00401395 <+9>:     call   0x4018f4 <__main>
 0x0040139a <+14>:    mov    DWORD PTR [esp+0x18],0xa
 0x004013a2 <+22>:    mov    DWORD PTR [esp+0x1c],0x0
 0x004013aa <+30>:    jmp    0x4013c4 <main+56>
 0x004013ac <+32>:    mov    eax,DWORD PTR [esp+0x1c]
 0x004013b0 <+36>:    mov    DWORD PTR [esp+0x4],eax
 0x004013b4 <+40>:    mov    DWORD PTR [esp],0x403064
 0x004013bb <+47>:    call   0x401b2c <printf>
 0x004013c0 <+52>:    inc    DWORD PTR [esp+0x1c]
 0x004013c4 <+56>:    mov    eax,DWORD PTR [esp+0x1c]  ; (1) 
 0x004013c8 <+60>:    cmp    eax,DWORD PTR [esp+0x18]  ; (2)
 0x004013cc <+64>:    jb     0x4013ac <main+32>
 0x004013ce <+66>:    mov    eax,0x0
 0x004013d3 <+71>:    leave
 0x004013d4 <+72>:    ret
 0x004013d5 <+73>:    nop
 0x004013d6 <+74>:    nop
 0x004013d7 <+75>:    nop

The value of i will be movied from the stack into eax.
Then the CPU will compare eax or i, with the value of z, which is in the stack.

All of this happen on every round.

If you optimized the code using gcc -O2 example.c -o example, the result will be:

 0x00401b70 <+0>:     push   ebp
 0x00401b71 <+1>:     mov    ebp,esp
 0x00401b73 <+3>:     push   ebx
 0x00401b74 <+4>:     and    esp,0xfffffff0
 0x00401b77 <+7>:     sub    esp,0x10
 0x00401b7a <+10>:    call   0x4018a8 <__main>
 0x00401b7f <+15>:    xor    ebx,ebx
 0x00401b81 <+17>:    lea    esi,[esi+0x0]
 0x00401b84 <+20>:    mov    DWORD PTR [esp+0x4],ebx
 0x00401b88 <+24>:    mov    DWORD PTR [esp],0x403064
 0x00401b8f <+31>:    call   0x401ae0 <printf>
 0x00401b94 <+36>:    inc    ebx
 0x00401b95 <+37>:    cmp    ebx,0xa                     ; (1)
 0x00401b98 <+40>:    jne    0x401b84 <main+20>
 0x00401b9a <+42>:    xor    eax,eax
 0x00401b9c <+44>:    mov    ebx,DWORD PTR [ebp-0x4]
 0x00401b9f <+47>:    leave
 0x00401ba0 <+48>:    ret
 0x00401ba1 <+49>:    nop
 0x00401ba2 <+50>:    nop
 0x00401ba3 <+51>:    nop

The compiler knows that there is no point of checking the value of z, so it modifies the code to something like for( i = 0 ; i < 10 ; i++ ).

In case the compiler doesn't konw the value of z like in this code:

#include <stdio.h>

void loop(unsigned int z) {
    unsigned int i;
    for( i = 0 ; i < z ; i++ )
        printf("%d\n", i);
}

int main(int argc, char **argv)
{
    unsigned int z = 10;

    loop(z);

    return 0;
}

The result will be:

0x0040138c <+0>:     push   esi
0x0040138d <+1>:     push   ebx
0x0040138e <+2>:     sub    esp,0x14
0x00401391 <+5>:     mov    esi,DWORD PTR [esp+0x20] ; (1)
0x00401395 <+9>:     test   esi,esi
0x00401397 <+11>:    je     0x4013b1 <loop+37>
0x00401399 <+13>:    xor    ebx,ebx                  ; (2)
0x0040139b <+15>:    nop
0x0040139c <+16>:    mov    DWORD PTR [esp+0x4],ebx
0x004013a0 <+20>:    mov    DWORD PTR [esp],0x403064
0x004013a7 <+27>:    call   0x401b0c <printf>
0x004013ac <+32>:    inc    ebx
0x004013ad <+33>:    cmp    ebx,esi
0x004013af <+35>:    jne    0x40139c <loop+16>
0x004013b1 <+37>:    add    esp,0x14
0x004013b4 <+40>:    pop    ebx
0x004013b5 <+41>:    pop    esi
0x004013b6 <+42>:    ret
0x004013b7 <+43>:    nop

z will endup in some unused register esi, registers are the fastest storage classed.
There is no local variable i, on the stack, the compiler used ebx to store the value of i, also register.

After all, it depends on the compiler and the optimization options you use, but, in all cases, C still faster, much faster, than VB.

Upvotes: 1

Steve Jessop

Reputation: 279425

This is called "hoisting" SlowVariabele out of the loop.

The compiler can do it only if it can prove that the value of SlowVariabele is the same every time, and that evaluating SlowVariabele has no side-effects.

So for example consider the following code (I assume for the sake of example that accessing through a pointer is "slow" for some reason):

void foo1(int *SlowVariabele, int *result) {
    for (int i = 0; i < *SlowVariabele; ++i) {
       --*result;
    }
}

The compiler cannot (in general) hoist, because for all it knows it will be called with result == SlowVariabele, and so the value of *SlowVariabele is changing during the loop.

On the other hand:

void foo2(int *result) {
    int val = 12;
    int *SlowVariabele = &val;
    for (int i = 0; i < *SlowVariabele; ++i) {
       --*result;
    }
}

Now at least in principle, the compiler can know that val never changes in the loop, and so it can hoist. Whether it actually does so is a matter of how aggressive the optimizer is and how good its analysis of the function is, but I'd expect any serious compiler to be capable of it.

Similarly, if foo1 was called with pointers that the compiler can determine (at the call site) are non-equal, and if the call is inlined, then the compiler could hoist. That's what restrict is for:

void foo3(int *restrict SlowVariabele, int *restrict result) {
    for (int i = 0; i < *SlowVariabele; ++i) {
       --*result;
    }
}

restrict (introduced in C99) means "you must not call this function with result == SlowVariabele", and allows the compiler to hoist.

Similarly:

void foo4(int *SlowVariabele, float *result) {
    for (int i = 0; i < *SlowVariabele; ++i) {
       --*result;
    }
}

The strict aliasing rules mean that SlowVariable and result must not refer to the same location (or the program has undefined behaviour anyway), and so again the compiler can hoist.

Upvotes: 6

Anton Kovalenko

Reputation: 21517

Generally, variables can't be slow (or fast) unless they are mapped to some weird kind of memory (you usually want to declare them volatile in this case).

But indeed, using a local variable creates more opportunities for optimization, and the effect may be very visible. The compiler can "cache" a global variable by itself only if it's able to prove that no function called within a loop can read or write that global variable. When you call an external function within a loop, the compiler probably won't be able to prove such a thing.

Upvotes: 2

junix

Reputation: 3221

Actually it depends on "SlowVariable" and on the behavior of your compiler. If your slow variable is e.g. volatile the compiler won't do any effort to cache it as the keyword volatile won't permit it. If it's not "volatile" there is a good chance that the compiler optimizes consecutive accesses to this variable by loading it once into the register.

Upvotes: 0

unwind

Reputation: 400129

"It" (the language) doesn't say. It must behave as if the variable is evaluated every time, of course.

An optimizing compiler can do a lot of clever things, so it's always best to leave these sorts of micro-optimizations to the compiler.

If you're going down the optimization by hand route, be sure to profile (=measure) and read the generated code.

Upvotes: 0

Ivaylo Strandjev

Reputation: 71009

It depends on your compiler, but I believe most of the contemporary compilers will optimize that for you if the value of SlowVariable is constant.

Upvotes: 0

Does GCC cache loop variables?

Answers (6)

Related Questions