Why is a CPU branch instruction slow?

Since I started programming, I have read in every place to avoid wasteful branches at all costs.

That's fine, although none of the articles explained why I should do this. What exactly happens when the CPU decodes a branch instruction and decides to do a jump? And what is the "thing" that makes it slower than other instructions (like addition)?

(editor's note: near-duplicate of Why is processing a sorted array faster than processing an unsorted array? which is a good example of easy vs. hard to predict branches leading to mispredictions, with a good answer which explains what's going on.)

Upvotes: 36

Answers (4)

Bogi

Reputation: 2598

This is not completely related, but advice to avoid branches at all cost is simply nonsense on modern speculative, out-of-order execution processors. Speculative execution is exactly the thing that gives your processor instructions to process while waiting for data from the memory. And speculation on branch conditions is what speculative execution is all about. Replacing branches with arithmetic can actually slow down your program so beware! More about here.

Upvotes: 1

Oliver Charlesworth

Reputation: 272537

A branch instruction is not inherently slower than any other instruction.

However, the reason you heard that branches should avoided is because modern CPUs follow a pipeline architecture. This means that there are multiple sequential instructions being executed simultaneously. But the pipeline can only be fully utilised if it's able to read the next instruction from memory on every cycle, which in turn means it needs to know which instruction to read.

On a conditional branch, it usually doesn't know ahead of time which path will be taken. So when this happens, the CPU has to stall until the decision has been resolved, and throws away everything in the pipeline that's behind the branch instruction. This lowers utilisation, and therefore performance.

This is the reason that things like branch prediction and branch delay slots exist.

Upvotes: 50

Bogdan Gavril MSFT

Reputation: 21458

Oli gave a very good explanation why branching is expensive: pipeline and branch prediction. I want to add however that you shouldn't be very concerned about the issue as modern compilers will optimize the code and one optimization is reducing branching.

You can read more about C++ optimizations in the Microsoft compiler here - the Profile Guided Optimizer uses runtime information (i.e. which parts of the code are most used) to optimize your code. The speed-up is in the 20% range.

One of the operations is "Conditional Branch Optimization", for example - assuming most of the time i is 6 - this is faster:

if (i==6)
{
    //...
}

else
{
    switch (i)
    {
        case 1: //
        case 2: //
        //...
    }
}

than:

switch (i)
{
    case 1: //
    //...
    case 6: //
    case 7: //
}

Here is a blog post on other optimizations: http://bogdangavril.wordpress.com/2011/11/02/optimizating-your-native-program/

Upvotes: 3

ZelluX

Reputation: 72655

Because CPU adopts pipeline to execute instructions, which means when a previous instruction is being executed at some stage (for example, reading values from registers), the next instruction will get executed at the same time, but at another stage (for example, decoding stage). It is OK for non-control instructions, but it makes thing complex when control instructions like jmp or call are executed.

Since CPU does not know what next instruction will be when executing a jmp instruction, it uses branch prediction techniques to predict whether the branch instruction will be taken or not (For example, a branch instruction in a loop snippet will probably take the instruction flow back to the loop head).

However, when such prediction fails, which is called branch misprediction, it will impact execution performance. Since the pipeline after the branch has to be discarded, and start over from the correct instruction.

Upvotes: 8

Why is a CPU branch instruction slow?

Answers (4)

Related Questions