Reputation: 3297
I was reading the Wikibooks page on optimization and I came across the line:
For pipelined processors, comparisons are slower than differences, because they imply a branch.
Why is it the case that comparisons imply a branch? For example if:
int i = 2;
int x = i<5;
Is there a branch in this? It makes sense to me to branch for if statements with conditionals but I don't understand why a comparison alone causes a branch.
Upvotes: 1
Views: 840
Reputation: 24936
Preamble: Modern compilers are capable of eliminating branches in various ways. Thus, none of the examples necessarily results a branch in the final (assembler or machine) code.
So why does the logic basically imply branches?
The code
bool check_interval_branch(int const i, int const min_i, int const max_i)
{
return min_i <= i && i <= max_i;
}
can be logically rewritten to be:
bool check_interval_branch(int const i, int const min_i, int const max_i)
{
if (min_i <= i)
{
if (i < max_i) return true;
}
return false;
}
Here you obviously have two branches (where the second one is only executed if the first one is true - short circuit) which can be mispredicted by the branch predictor which in turn leads to a reroll of the pipeline.
Visual Studio 2013 (with optimization turned one) generates the following assembly containing two branches for check_interval_branch
:
push ebp
mov ebp, esp
mov eax, DWORD PTR _i$[ebp]
cmp DWORD PTR _min_i$[ebp], eax // comparison
jg SHORT $LN3@check_inte // conditional jump
cmp eax, DWORD PTR _max_i$[ebp] // comparison
jg SHORT $LN3@check_inte // conditional jump
mov al, 1
pop ebp
ret 0
$LN3@check_inte:
xor al, al
pop ebp
ret 0
The code
bool check_interval_diff(int const i, int const min_i, int const max_i)
{
return unsigned(i - min_i) <= unsigned(max_i - min_i);
}
is logically identical to
bool check_interval_diff(int const i, int const min_i, int const max_i)
{
if (unsigned(i – min_i) <= unsigned(max_i – min_i)) { return true; }
return false;
}
which contains only a single branch but executes two differences.
The generated code for check_interval_diff
of Visual Studio 2013 doesn't even contain a conditional jump.
push ebp
mov ebp, esp
mov edx, DWORD PTR _i$[ebp]
mov eax, DWORD PTR _max_i$[ebp]
sub eax, DWORD PTR _min_i$[ebp]
sub edx, DWORD PTR _min_i$[ebp]
cmp eax, edx // comparison
sbb eax, eax
inc eax
pop ebp
ret 0
(The trick here is that the subtraction done by sbb
is different by 1 according to the carry flag which in turn is set to 1 or 0 by the cmp
instruction.)
In fact you see three differences here (2x sub
, 1x sbb
).
See Mysticals answer here about branch prediction.
Your code int x = i<5;
is logically identical to
int x = false;
if (i < 5)
{
x = true;
}
which again contains a branch (x = true
only executed if i < 5
.)
Upvotes: 3
Reputation: 153909
Who ever wrote that page is not competent as a programmer. First,
comparisons don't necessarily imply a branch; it depends on what you
do with them. And whether that implies a branch or not depends on the
processor and the compiler. An if
generally requires a branch, but
even then, a good optimizer can sometimes avoid it. A while
or a
for
will generally require a branch, unless the compiler can unwind
the loop, but that branch is highly predictable, so even when branch
prediction is an issue, it may not matter.
More generally, if you worry about anything at this level when writing your code, you're wasting your time, and making maintenance far more difficult. The only time you should be concerned is once you have a performance problem, and the profiler shows that this is the spot where you're loosing performance. At that point, you can experiment with several different ways of writing the code, to determine which one will result in faster code for your combination of compiler and hardware. (Change the compiler or the hardware, and it might not be the same one.)
Upvotes: 1
Reputation: 23789
While the answers given here are good, not all comparisons are translated into branch instructions (they do introduce data dependencies which may also cost you some performance).
For example, the following C code
int main()
{
volatile int i;
int x = i<5;
return x;
}
Is compiled by gcc (x86-64, Optimizations enabled) into:
movl -4(%rbp), %eax
cmpl $5, %eax
setl %al
movzbl %al, %eax
The setl
instruction sets the value of AL
according to the result of the comparison instruction preceding it.
Of course, this is a very simple example -- and the cmp/setl
combination probably introduces dependencies that prevent the processor for executing them in parallel or may even cost you a few cycles.
Still, on a modern processor, not all comparisons are turned into branch instructions.
Upvotes: 3
Reputation: 29981
This involves only a single branch:
unsigned(i – min_i) <= unsigned(max_i – min_i)
While this involves two:
min_i <= i && i <= max_i
When a CPU encounters a branch, it consults its predictor and follows the most likely path. If the prediction is correct, the branch is essentially free in terms of performance. If the prediction is wrong, the CPU needs to flush the pipeline and start all over.
This kind of optimization is a double-edged sword --- if your branches are highly predictable, the first might actually run slower than the second. It entirely depends on how much you know about your data.
Upvotes: 3