Does this approach positvely impact branch prediction?

Question

I've been trying to make a type of callback function, and in the process I realized the one I prefer, might also change how the CPU handles branch prediction under the hood. I understand that over-optimizing is something one should avoid, especially early on, but this is an approach that suits my use very well, and having the effect of avoiding cache misses would be a neat bonus.

I don't understand CPU-architecture enough to answer this by myself however, so I'm curious if someone maybe knows if this impacts how branch prediction is handled, and if it does so in a beneficial way.

My initial solution would look like this: (mind you with larger blocks)

enum _myenum { t1, t2, t3 }myenum; //enumerator and switch 

switch (myenum){
case t1: 
    { /*a block*/ }
    break;
case t2: 
    { /*another block*/ }
    break;
case t3: 
    { /*yet another block*/ }
    break;
default: break;
}

And the solution that suits my use:

enum _myenum { t1, t2, t3 }myenum; //same enum
//a map of lambdas/functions
std::map<_myenum, std::function>lambda_map{ 
    { t1, [&]() {/*a block*/} },
    { t2, [&]() {/*another block*/ }},
    { t3, [&]() {/*yet another block*/ }}
}; 

lambda_map[myenum](); //to run the callback

Where I will run the lambda_map[myenum](); instead of the harder-to-track switch function.

But does the lambda-pointer map save the CPU from potential cache misses? Like as far as I can gather, not having to predict if statements or run through switch cases, but instead jumping to the specific function based on the set myenum variable, should accomplish that, right? Or at least influence it somehow.

I'm curious if a lambda_map[enum](); could be better on branch-prediction and avoiding cache misses, than a switch statement going through a list of possible cases.

Nelfeal · Accepted Answer

Where I will run the lambda_mapmyenum; instead of the harder-to-track switch function.

This is probably what is misleading you. A switch statement is not necessarily "harder to track" than some map-like structure. In fact, because of compiler optimization, it may very well be implemented as a jump table, that is an array of jump instructions, similar to what you imagine doing with your second solution.

In you case, using a std::map is pretty much guaranteed to be slower because of the indirections (and cache misses that go with them), and because it uses binary search to find a value. What you want is an array (std::array) of function pointers (use std::function or lambdas), using the enumerators as indices. In cases where an array is not feasable (because of the nature of the keys), a hashmap (std::unordered_map) can be used instead.

Performance-wise, there is no way to tell, without profiling, whether the array-of-functions method beats a compiler-optimized switch.

With that being said, I often find myself writing arrays or hashmaps (of functions or not) in place of switch statement due to personal preferences ranging from style (block nesting, placement, ...) to practical maintainability. Also because I dislike the structure of switch statements. Take this last part with a grain of salt: I'm sure there are people who prefer switch-like structures and despise the alternative.

Does this approach positvely impact branch prediction?

Answers (1)

Related Questions