owagh
owagh

Reputation: 3528

How large is the branch prediction buffer for a typical modern CPU?

The application that I'm dealing with has a large number of a if-statements with the characteristics that in any one execution, only one of the branches is executed 90% of the time.

Now, I can test the impact of branch prediction on a single if-statement for a specific CPU by doing something like this :-

#include <iostream>
#include <stdlib.h>

using namespace std;

int main() {
  int a;
  cin>>a;
  srand(a);
  int b;

  long count=0;

  for (int i=0; i<10000; i++) {
    for (int j=0; j<65535; j++) {
      b = rand() % 30 + 1;
      if (b > 15) // This can be changed to get statistics for different %-ages
        count += (b+10);
    }
  }

  cout << count <<"\n";
}

My question is, is there a way of testing the scalability and impact of branch prediction with multiple if-statements in an actual large application for a given CPU?

Basically, I want to be able to figure out how much branch mispredicts are costing on various CPUs and their impact on the application.

Upvotes: 5

Views: 560

Answers (1)

Louis Ricci
Louis Ricci

Reputation: 21106

You need to take into a account the complexity of your branches, the compiler may remove branches using architecture specific operation codes like CMOV (compare and move).

Your simple example code

if (b > 15)
    count += (b+10);

Here's the code compiled into machine language

;; assembly x86 FASM/NASM syntax

;; WITH branching
MOV ebx, [b] ;; b
MOV ecx, [count] ;; count
CMP ebx, 15 ;; if condition to set flags
JLE .skip ;; { branch/jump over the if body when less than or equal
LEA eax, [ecx + ebx + 10] ;; count + b+10
MOV [count], eax ;; store count
.skip: ;; } label after the if block

;; WITHOUT branching
MOV ebx, [b] ;; b
MOV ecx, [count] ;; count
LEA eax, [ecx + ebx + 10] ;; pre-calc avoiding the need to branch
CMP ebx, 15 ;; if condition to set flags
CMOVLE eax, ecx ;; make eax equal to ecx (current count) when less than or equal
            ;; avoiding the branch/jump
MOV [count], eax ;; store count

So unless you know how your optimizing compiler is optimizing your code, it's a little difficult to profile branch prediction. If you are checking your machine code output and know you have a lot of J[condition] statements then using the code profiling tool mentioned in the comments are sufficient. Trying to roll your own branch prediction test without using the proper architecture debugging registers will lead to the situation I demonstrated above.

Upvotes: 4

Related Questions