Reputation: 363
OK, so a compiler is free to reorder code fragments for performance reasons. Let's suppose some code snippet, translated directly into machine code with no optimizations applied, looks like this:
machine_instruction_1
machine_instruction_2
machine_instruction_3
machine_instruction_4
machine_instruction_5
but a smart compiler decides the original order is highly inefficient and reorders the same code so that the new order of resulting machine instructions is as follows:
machine_instruction_5
machine_instruction_4
machine_instruction_3
machine_instruction_2
machine_instruction_1
So far so good.
Here's where the tricky part starts. The resulting machine instructions will be executed by a cpu which is free to reshuffle them once again any way it finds appropriate for performance reasons, as long as the code logic is preserved. Since we're dealing with two "layers" of instruction reordering:
what makes the compile-time instruction reordering relevant at all? All the cpu sees is a sequence of raw machine instructions, with no indication of any prior optimizations performed by the compiler. If cpu introduces its own "layer" of reordering, how come it doesn't invalidate the order of instructions set by the compiler? Basically, what forces cpu to respect compiler optimizations? How do compile-time reordering and run-time reordering "cooperate", how does the latter complement the former?
Upvotes: 6
Views: 1963
Reputation: 11547
When considering instruction execution what must be respected is the program semantic. Any ordering is correct, as long as this is respected. Concretely, this is described by "dependencies" that indicate if some instructions require a given order with respect to the program correct behavior. For instance consider the following program
1 x <= y+3
2 z <= 2*x
3 w = 5*y
4 y = 2*a
Instructions 1 and 2 are dependent. If their relative ordering is modified, the program is not compliant with what wanted the programmer and any reordering is forbidden. For different reasons, 4 cannot be executed unchanged before y has been used by 1 and 3. There are different kind of dependancies, including when considering control flow.
Compiler and hardware try to reorder programs in order to improve their efficiency, while respecting dependencies. Indeed their actions is complementary.
Compiler can consider larger reorganisations than processor and uses more complex heuristics to do it. A compiler can have a large view on a program and reorder large part of code. In theory, an instruction at a distance of, say, 1000 can be displaced, if there is no dependency violation and the compiler considers that it can improve program execution. It can completely reorganize code, unroll loops, etc. On the contrary, the processor has a relatively limited window of prefetched instructions that can be considered for reordering and any rearrangement can only concerns close instructions and is based on simple methods doable within a cycle.
But the processor has a great advantage. It can do dynamic reordering and respond to random events. At a given time, it considers what instruction can be executed with respect to dependencies and data availability and reorders the code accordingly. This is based on dynamic dependencies, and, for instance, if the result of a previous dependant instruction is not available because of a cache miss, it will execute other instructions that respect dependencies. Successive runs of the same program with the same input data can lead to different ordering depending on branch mispredictions, cache misses, etc.
So there is no competition between compiler and processor, but an efficient collaboration.
Upvotes: 5