How can you optimize jump instruction slowed by cpu branch miss?

Background

It is known that random branching costs significant overhead. And there was a post in SO answering such question.

Similar performance impact can be seen with jump instructions in many CPU architectures. And there was a post in SO for such theme as well

So if you used programming patterns like function pointer or just normal inheritable C++ class based function call, we have to pay the cost of branch miss.

Even for the most advanced hardware branch prediction algorithms can only do globally shared address history based branch prediction, and perhaps it may speculatively fetch the branch target address code and so on.

But by definition, it won't work for the first execution.

Many embedded appliance, smartphones, etc should demand maximum performance at

boot time
first execution of an application like browser

Which calls millions of function calls and may not want to change there software architecture significantly like converting all indirect jumps into direct jumps...

If the conditions are as follows,

Conditions:

must run at top speed
- at the first run
  or
- the jump looks completely random to the cpu

Is the following best to achieve the result?

And I want to know any example that does dynamic/static code rewrite of indirect to direct jumps.

How to get maximum performance:

always use either branch likely or unlikely
use prelink
use mlock or readahead, cacheflush(ICACHE) the function call
rewrite indirect jumps to direct jumps dynamically or statically
(found a paper written in 1996 by Bradley M Kuhn for static rewrite for some case)

The paper I found was translating virtual function calls to static function call at source code level, but binary link time optimization seemed better for the software developer point of view.

Upvotes: 3

Answers (3)

Danny

Reputation: 405

For function pointers and virtual functions you will need to predict the target of the branch, because you can't know at compile time which code needs to be executed next.

http://en.wikipedia.org/wiki/Branch_target_predictor

Upvotes: 1

Nikolai Fetissov

Reputation: 84189

CPUs are already doing this for you, see Branch Prediction.

Upvotes: 1

StilesCrisis

Reputation: 16290

Are you saying you can predict with near-certainty where the branch will go, but the CPU cannot? A heavy cost is incurred when branch prediction fails; if your branch prediction table is hot and it's hitting the same path every time, I posit that branch prediction would likely succeed. (And if it's not consistent, then obviously prediction cannot succeed 100%.)

Upvotes: 2

How can you optimize jump instruction slowed by cpu branch miss?

Background

Conditions:

How to get maximum performance:

Answers (3)

Related Questions