Reputation: 12742
I just came across this article: Compute the minimum or maximum of two integers without branching
It starts with "[o]n some rare machines where branching is expensive...".
I used to think that branching is always expensive as it often forces the processor to clear and restart its execution pipeline (e.g. see Why is it faster to process a sorted array than an unsorted array?).
This leaves me with a couple of questions:
(x < y) ? x : y
without performance degradation?Math.min(...)
function is just that ternary statement...Upvotes: 3
Views: 1474
Reputation: 46422
Did the writer of the article get that part wrong? Or was this article maybe written in a time before branching was an issue (I can't find a date on it).
The oldest comment is 5 years old, so it's no hot news. However, unpredictable branching is always expensive and so was it 5 years ago. In the meantime, it just got worse as modern CPUs can do much more per cycle and a mispredicted branch therefore cost more work.
But in a sense, the writer is right. The majority of CPUs is not found in our PCs and servers, but in embedded devices, where the situation differs.
Do modern processors have a way to complete minimal branches like the one in (x < y) ? x : y without performance degradation?
Yes and no. AFAIK Math.max
gets always translated as a conditional move, which means no branching. You own max
may or may not use it, depending on statistics the JVM collected.
There's no silver bullet. With predictable outcomes, branching is faster. Finding out exactly, what pattern the CPU recognizes, is hard. The JVM simply looks at how often a branch gets takes and uses a magic threshold of about 18%. See my own question and answer for details.
Or do all modern compilers simply implement this hack automatically? Specifically, what does Java do? Especially since its Math.min(...) function is just that ternary statement...
It's actually a compiler intrinsic. Whenever the JITc sees this very method called, it handles it specially. When you copy the method, it gets no special treatments.
In this case, the intrinsic is not very useful, as it's something what gets heavily optimized anyway. For methods like Long#numberOfLeadingZeros
, the intrinsic is essential, as the code is rather long and slow and modern CPUs get do it in a single cycle.
Upvotes: 6