Reputation: 4249
C & C++ compilers are allowed to reorder operations as long as the as-if rule holds. What is an example of such a reordering performed by a compiler, and what is the potential performance gain to be had by doing it?
Examples involving any (C/C++) compiler on any platform are welcome.
Upvotes: 9
Views: 2483
Reputation: 16777
Suppose you have the following operations being performed:
int i=0,j=0;
i++;
i++;
i++;
j++;
j++;
j++;
Ignoring for the moment that the three increments would likely be optimized away by the compiler into one +=3
, you will end up having a higher processor-pipeline throughput if you reordered the operations as
i++;
j++;
i++;
j++;
i++;
j++;
since j++
doesn't have to wait for the result of i++
while in the previous case, most of the instructions had a data dependency on the previous instruction. In more complicated computations, where there isn't an easy way to reducing the number of instructions to be performed, the compiler can still look at data dependencies and reorder instructions so that an instruction depending on the result of an earlier instruction is as far away from it as possible.
Another example of such an optimization is when you are dealing with pure functions. Looking at a simple example again, assume you have a pure function f(int x)
which you are summing over a loop.
int tot = 0;
int x;//something known only at runtime
for(int i = 0; i < 100; i++)
tot += f(x);
Since f
is a pure function, the compiler can reorder calls to it as it pleases. In particular, it can transform this loop to
int tot = 0;
int x;//something known only at runtime
int fval = f(x);
for(int i = 0; i < 100; i++)
tot += fval;
Upvotes: 11
Reputation: 215507
Suppose you have a loop like:
for (i=0; i<n; i++) dest[i] = src[i];
Think memcpy
. You might want the compiler to be able to vectorize this, i.e. load 8 or 16 bytes at a time and then store 8 or 16 at a time. Making that transformation is a reordering, since it would cause src[1]
to be read before dest[0]
is stored. Moreover, unless the compiler knows that src
and dest
don't overlap, it's an invalid transformation, i.e. one the compiler is not allowed to make. Use of the restrict
keyword (C99 and later) allows you to tell the compiler that they don't overlap so that this kind of (extremely valuable) optimization is possible.
The same sort of thing arises all the time in operations on arrays that aren't just copying - things like vector/matrix operations, transformations of sound/image sample data, etc.
Upvotes: 4
Reputation: 14677
I'm sure there are quite a few examples where reordering operations will yield faster performance. An obvious example would be to reorder loads as early as possible, since these are typically much slower than other CPU operations. By doing other, unrelated work whilst the memory is being fetched, the CPU can save time overall.
That is, given something like this:
expensive_calculation();
x = load();
do_something(x);
We can reorder it like this:
x = load();
expensive_calculation();
do_something(x);
So while we're waiting for the load to complete, we can essentially do expensive_calculation()
for free.
Upvotes: 5