Reputation: 9518
When I have a loop like:
for (int i = 0; i < SlowVariable; i++)
{
//
}
I know that in VB6 the SlowVariable
is accessed every iteration of the loop, making the following much more efficient:
int cnt = SlowVariable;
for (int i = 0; i < cnt; i++)
{
//
}
Do I need the make the same optimizations in GCC? Or does it evaluate SlowVariable
only once?
Upvotes: 2
Views: 530
Reputation:
This depends on how the compiler optimize, for example here:
#include <stdio.h>
int main(int argc, char **argv)
{
unsigned int i;
unsigned int z = 10;
for( i = 0 ; i < z ; i++ )
printf("%d\n", i);
return 0;
}
If you compiled it using gcc example.c -o example
, the result code will be:
0x0040138c <+0>: push ebp
0x0040138d <+1>: mov ebp,esp
0x0040138f <+3>: and esp,0xfffffff0
0x00401392 <+6>: sub esp,0x20
0x00401395 <+9>: call 0x4018f4 <__main>
0x0040139a <+14>: mov DWORD PTR [esp+0x18],0xa
0x004013a2 <+22>: mov DWORD PTR [esp+0x1c],0x0
0x004013aa <+30>: jmp 0x4013c4 <main+56>
0x004013ac <+32>: mov eax,DWORD PTR [esp+0x1c]
0x004013b0 <+36>: mov DWORD PTR [esp+0x4],eax
0x004013b4 <+40>: mov DWORD PTR [esp],0x403064
0x004013bb <+47>: call 0x401b2c <printf>
0x004013c0 <+52>: inc DWORD PTR [esp+0x1c]
0x004013c4 <+56>: mov eax,DWORD PTR [esp+0x1c] ; (1)
0x004013c8 <+60>: cmp eax,DWORD PTR [esp+0x18] ; (2)
0x004013cc <+64>: jb 0x4013ac <main+32>
0x004013ce <+66>: mov eax,0x0
0x004013d3 <+71>: leave
0x004013d4 <+72>: ret
0x004013d5 <+73>: nop
0x004013d6 <+74>: nop
0x004013d7 <+75>: nop
i
will be movied from the stack into eax
.eax
or i
, with the value of z
, which is in the stack.All of this happen on every round.
If you optimized the code using gcc -O2 example.c -o example
, the result will be:
0x00401b70 <+0>: push ebp
0x00401b71 <+1>: mov ebp,esp
0x00401b73 <+3>: push ebx
0x00401b74 <+4>: and esp,0xfffffff0
0x00401b77 <+7>: sub esp,0x10
0x00401b7a <+10>: call 0x4018a8 <__main>
0x00401b7f <+15>: xor ebx,ebx
0x00401b81 <+17>: lea esi,[esi+0x0]
0x00401b84 <+20>: mov DWORD PTR [esp+0x4],ebx
0x00401b88 <+24>: mov DWORD PTR [esp],0x403064
0x00401b8f <+31>: call 0x401ae0 <printf>
0x00401b94 <+36>: inc ebx
0x00401b95 <+37>: cmp ebx,0xa ; (1)
0x00401b98 <+40>: jne 0x401b84 <main+20>
0x00401b9a <+42>: xor eax,eax
0x00401b9c <+44>: mov ebx,DWORD PTR [ebp-0x4]
0x00401b9f <+47>: leave
0x00401ba0 <+48>: ret
0x00401ba1 <+49>: nop
0x00401ba2 <+50>: nop
0x00401ba3 <+51>: nop
z
, so it modifies the code to something like for( i = 0 ; i < 10 ; i++ )
.In case the compiler doesn't konw the value of z
like in this code:
#include <stdio.h>
void loop(unsigned int z) {
unsigned int i;
for( i = 0 ; i < z ; i++ )
printf("%d\n", i);
}
int main(int argc, char **argv)
{
unsigned int z = 10;
loop(z);
return 0;
}
The result will be:
0x0040138c <+0>: push esi
0x0040138d <+1>: push ebx
0x0040138e <+2>: sub esp,0x14
0x00401391 <+5>: mov esi,DWORD PTR [esp+0x20] ; (1)
0x00401395 <+9>: test esi,esi
0x00401397 <+11>: je 0x4013b1 <loop+37>
0x00401399 <+13>: xor ebx,ebx ; (2)
0x0040139b <+15>: nop
0x0040139c <+16>: mov DWORD PTR [esp+0x4],ebx
0x004013a0 <+20>: mov DWORD PTR [esp],0x403064
0x004013a7 <+27>: call 0x401b0c <printf>
0x004013ac <+32>: inc ebx
0x004013ad <+33>: cmp ebx,esi
0x004013af <+35>: jne 0x40139c <loop+16>
0x004013b1 <+37>: add esp,0x14
0x004013b4 <+40>: pop ebx
0x004013b5 <+41>: pop esi
0x004013b6 <+42>: ret
0x004013b7 <+43>: nop
z
will endup in some unused register esi
, registers are the fastest storage classed.i
, on the stack, the compiler used ebx
to store the value of i
, also register.After all, it depends on the compiler and the optimization options you use, but, in all cases, C still faster, much faster, than VB.
Upvotes: 1
Reputation: 279425
This is called "hoisting" SlowVariabele
out of the loop.
The compiler can do it only if it can prove that the value of SlowVariabele
is the same every time, and that evaluating SlowVariabele
has no side-effects.
So for example consider the following code (I assume for the sake of example that accessing through a pointer is "slow" for some reason):
void foo1(int *SlowVariabele, int *result) {
for (int i = 0; i < *SlowVariabele; ++i) {
--*result;
}
}
The compiler cannot (in general) hoist, because for all it knows it will be called with result == SlowVariabele
, and so the value of *SlowVariabele
is changing during the loop.
On the other hand:
void foo2(int *result) {
int val = 12;
int *SlowVariabele = &val;
for (int i = 0; i < *SlowVariabele; ++i) {
--*result;
}
}
Now at least in principle, the compiler can know that val
never changes in the loop, and so it can hoist. Whether it actually does so is a matter of how aggressive the optimizer is and how good its analysis of the function is, but I'd expect any serious compiler to be capable of it.
Similarly, if foo1
was called with pointers that the compiler can determine (at the call site) are non-equal, and if the call is inlined, then the compiler could hoist. That's what restrict
is for:
void foo3(int *restrict SlowVariabele, int *restrict result) {
for (int i = 0; i < *SlowVariabele; ++i) {
--*result;
}
}
restrict
(introduced in C99) means "you must not call this function with result == SlowVariabele
", and allows the compiler to hoist.
Similarly:
void foo4(int *SlowVariabele, float *result) {
for (int i = 0; i < *SlowVariabele; ++i) {
--*result;
}
}
The strict aliasing rules mean that SlowVariable
and result
must not refer to the same location (or the program has undefined behaviour anyway), and so again the compiler can hoist.
Upvotes: 6
Reputation: 21517
Generally, variables can't be slow (or fast) unless they are mapped to some weird kind of memory (you usually want to declare them volatile
in this case).
But indeed, using a local variable creates more opportunities for optimization, and the effect may be very visible. The compiler can "cache" a global variable by itself only if it's able to prove that no function called within a loop can read or write that global variable. When you call an external function within a loop, the compiler probably won't be able to prove such a thing.
Upvotes: 2
Reputation: 3221
Actually it depends on "SlowVariable" and on the behavior of your compiler. If your slow variable is e.g. volatile the compiler won't do any effort to cache it as the keyword volatile won't permit it. If it's not "volatile" there is a good chance that the compiler optimizes consecutive accesses to this variable by loading it once into the register.
Upvotes: 0
Reputation: 400129
"It" (the language) doesn't say. It must behave as if the variable is evaluated every time, of course.
An optimizing compiler can do a lot of clever things, so it's always best to leave these sorts of micro-optimizations to the compiler.
If you're going down the optimization by hand route, be sure to profile (=measure) and read the generated code.
Upvotes: 0
Reputation: 71009
It depends on your compiler, but I believe most of the contemporary compilers will optimize that for you if the value of SlowVariable is constant.
Upvotes: 0