aaron-prindle
aaron-prindle

Reputation: 3297

Is it the case that comparisons imply a branch?

I was reading the Wikibooks page on optimization and I came across the line:

For pipelined processors, comparisons are slower than differences, because they imply a branch.

Why is it the case that comparisons imply a branch? For example if:

int i = 2;
int x = i<5;

Is there a branch in this? It makes sense to me to branch for if statements with conditionals but I don't understand why a comparison alone causes a branch.

Upvotes: 1

Views: 840

Answers (4)

Pixelchemist
Pixelchemist

Reputation: 24936

Preamble: Modern compilers are capable of eliminating branches in various ways. Thus, none of the examples necessarily results a branch in the final (assembler or machine) code.

So why does the logic basically imply branches?

The code

bool check_interval_branch(int const i, int const min_i, int const max_i)
{
  return min_i <= i && i <= max_i;
} 

can be logically rewritten to be:

bool check_interval_branch(int const i, int const min_i, int const max_i)
{
  if (min_i <= i) 
  { 
    if (i < max_i) return true; 
  }
  return false;
} 

Here you obviously have two branches (where the second one is only executed if the first one is true - short circuit) which can be mispredicted by the branch predictor which in turn leads to a reroll of the pipeline.

Visual Studio 2013 (with optimization turned one) generates the following assembly containing two branches for check_interval_branch:

  push  ebp
  mov   ebp, esp
  mov   eax, DWORD PTR _i$[ebp]
  cmp   DWORD PTR _min_i$[ebp], eax    // comparison
  jg    SHORT $LN3@check_inte          // conditional jump
  cmp   eax, DWORD PTR _max_i$[ebp]    // comparison
  jg    SHORT $LN3@check_inte          // conditional jump
  mov   al, 1
  pop   ebp
  ret   0
$LN3@check_inte:
  xor   al, al
  pop   ebp
  ret   0

The code

bool check_interval_diff(int const i, int const min_i, int const max_i)
{
  return unsigned(i - min_i) <= unsigned(max_i - min_i);
}

is logically identical to

bool check_interval_diff(int const i, int const min_i, int const max_i)
{
  if (unsigned(i – min_i) <= unsigned(max_i – min_i)) { return true; }
  return false;
}

which contains only a single branch but executes two differences.

The generated code for check_interval_diff of Visual Studio 2013 doesn't even contain a conditional jump.

  push  ebp
  mov   ebp, esp
  mov   edx, DWORD PTR _i$[ebp]
  mov   eax, DWORD PTR _max_i$[ebp]
  sub   eax, DWORD PTR _min_i$[ebp]
  sub   edx, DWORD PTR _min_i$[ebp]
  cmp   eax, edx                    // comparison
  sbb   eax, eax
  inc   eax
  pop   ebp
  ret   0

(The trick here is that the subtraction done by sbb is different by 1 according to the carry flag which in turn is set to 1 or 0 by the cmp instruction.)

In fact you see three differences here (2x sub, 1x sbb).

It probably depends on your data / the use case which one is faster.

See Mysticals answer here about branch prediction.

Your code int x = i<5; is logically identical to

int x = false;
if (i < 5)
{
  x = true;
}

which again contains a branch (x = true only executed if i < 5.)

Upvotes: 3

James Kanze
James Kanze

Reputation: 153909

Who ever wrote that page is not competent as a programmer. First, comparisons don't necessarily imply a branch; it depends on what you do with them. And whether that implies a branch or not depends on the processor and the compiler. An if generally requires a branch, but even then, a good optimizer can sometimes avoid it. A while or a for will generally require a branch, unless the compiler can unwind the loop, but that branch is highly predictable, so even when branch prediction is an issue, it may not matter.

More generally, if you worry about anything at this level when writing your code, you're wasting your time, and making maintenance far more difficult. The only time you should be concerned is once you have a performance problem, and the profiler shows that this is the spot where you're loosing performance. At that point, you can experiment with several different ways of writing the code, to determine which one will result in faster code for your combination of compiler and hardware. (Change the compiler or the hardware, and it might not be the same one.)

Upvotes: 1

nimrodm
nimrodm

Reputation: 23789

While the answers given here are good, not all comparisons are translated into branch instructions (they do introduce data dependencies which may also cost you some performance).

For example, the following C code

int main()
{
    volatile int i;
    int x = i<5;

    return x;
}

Is compiled by gcc (x86-64, Optimizations enabled) into:

    movl    -4(%rbp), %eax
    cmpl    $5, %eax
    setl    %al
    movzbl  %al, %eax

The setl instruction sets the value of AL according to the result of the comparison instruction preceding it.

Of course, this is a very simple example -- and the cmp/setl combination probably introduces dependencies that prevent the processor for executing them in parallel or may even cost you a few cycles.

Still, on a modern processor, not all comparisons are turned into branch instructions.

Upvotes: 3

Cory Nelson
Cory Nelson

Reputation: 29981

This involves only a single branch:

unsigned(i – min_i) <= unsigned(max_i – min_i)

While this involves two:

min_i <= i && i <= max_i

When a CPU encounters a branch, it consults its predictor and follows the most likely path. If the prediction is correct, the branch is essentially free in terms of performance. If the prediction is wrong, the CPU needs to flush the pipeline and start all over.

This kind of optimization is a double-edged sword --- if your branches are highly predictable, the first might actually run slower than the second. It entirely depends on how much you know about your data.

Upvotes: 3

Related Questions