Reputation: 41
There are cases where (logically at least) it makes no difference if I leave out the else
keyword, for example:
int func(int num)
{
if(num == 10) return 99999;
**else** return -1;
}
Question is, does the presence of else
make any difference performance-wise in less trivial cases than this? I know of the existence of the branch predictor and the pipeline, is it affected by the absence/presence of else
? Maybe it doesn't even matter since compilers might just optmize it away anyways, but I'm curious.
I've been spending my time on sites that let you practice solving problems, almost all of the problems have trivial cases, for example when making an isPrime()
function you might wanna filter out <=1 numbers first and only then jump to the business logic. That's where the curiosity came from.
Upvotes: 2
Views: 204
Reputation: 365537
CPUs run machine code, not C++ source. In machine code we only have "goto" (jumps) and if()goto
conditional branches. This else
doesn't make a difference to the actual program logic and thus you'd expect either version would compile to the same machine code. (And if it doesn't, the worse one is a missed-optimization compiler defect.)
But source-code structure (such as hiding something behind an else
vs. on the main path after an if
) might or might not affect the compiler's heuristics for guessing which path is more often taken and thus laying out that as the fast path with fewer taken branches, in the absence of [[likely]]
/ [[unlikely]]
decorators or profile-guided optimization info. Compilers have various heuristics for guessing whether to optimize for the case where an if
is normally taken or normally not taken, e.g. looking at the condition and whether it's comparing a variable to a small or large number, especially if it has any value-range info on the variable being compared.
See Ciro's answer showing the effect on branch layout. Note that this is unrelated to runtime branch prediction except on older CPUs where static prediction (forward-taken / backwards not-taken) is sometimes used when dynamic predictors are cold. With IT-TAGE predictors (at least in Intel since Haswell), that doesn't happen, they always predict something. Taken branches in machine code are a bit less efficient than fall-through (e.g. into an if
body) because the front-end isn't just fetching contiguous code.
But there's no direct or simple connection between source layout and machine-code layout; it's the compiler's job optimize branch layout, not just transliterate C into asm, so you're going to have to look at the asm (e.g. on https://godbolt.org/) if you want to know what the compiler did. See also How to remove "noise" from GCC/clang assembly output?
If you're interested in micro-optimizations, see What considerations go into predicting latency for operations on modern superscalar processors and how can I calculate them by hand? and read up on details of recent CPUs, like Chips & Cheese's write-up of Zen 4. https://chipsandcheese.com/2022/11/05/amds-zen-4-part-1-frontend-and-execution-engine/ And older articles like David Kanter's excellent deep dives into Sandybridge and Haswell. Also links in https://stackoverflow.com/tags/x86/info such as Agner Fog's excellent microarchitecture guides: https://www.agner.org/optimize/
For basics, see Modern Microprocessors A 90-Minute Guide!
Upvotes: 2