Reputation: 474
I'm writing an algorithm where efficiency is very important. Sometimes I also want to track the algorithm behaviour by calling some "callback" functions. Let's say my code looks like this:
public float myAlgorithm(AlgorithmTracker tracker) {
while (something) { // millions of iterations
doStuff();
if (tracker != null) tracker.incrementIterationCount(); // <--- How to run the if only once?
doOtherStaff();
}
}
How do I prevent to execute the if statement a million times? Does the compiler see that tracker
is never reassigned? If it's null on the first check, it will always be. If it's not, it will never be.
Ideally I would like to tell the compiler to build my code in such a way so that if tracker
is null (at runtime) it would run with the same performance as
while (something) { // millions of iterations
doStuff();
doOtherStaff();
}
I thought of two solutions:
I could write two versions of myAlgorithm, one with the calls and one without them, but that would lead to lots of code duplication.
I could extract AlgorithmTracker to an interface and create a fake empty tracker with empty functions. Still, I don't know if the compiler would optimize the calls away.
Upvotes: 2
Views: 617
Reputation: 8335
For most CPU architectures you don't have to worry about the optimization that you want to apply because that particular optimization is part of most contemporary CPU's. it is called branch prediction and current CPU's are very good at this.
on an average every 6th instruction executed by a CPU is a branch, and if for every branch CPU had to wait and evaluate the branch condition it would make the execution lot slower.
So when faced with a branch, without evaluating the branch condition, the CPU starts executing (Speculative execution) a path which it thinks is highly likely to be correct and on a later stage when result of branch condition becomes available, CPU checks if it is executing the correct path.
if the path picked by CPU is consistent with the result of branch condition, then CPU knows that it's executing correct path and hence it keeps going at 100% speed, otherwise it will have to flush out all the instructions that it executed speculatively and start with the correct path.
enter the branch predictor subsystem of a CPU. in its most basic form it would store the information about the past behavior of a branch, for example if a branch is not being picked for some time then its likely that it would not be picked now. This is a simple explanation and a real branch predictor is going to be quite complex.
Given that at their core the branch predictors are just pattern matching machines, if your branch shows a predictable pattern then you can rest assured that branch predictor will get it right. but if your branch shows no pattern at all then branch predictor is not going to help you, worst yet it would hamper your code execution because of all the wrong predictions.
In your case value of branch control variable never changes, so the branch is either going to be picked every iteration of the loop or it is never going to be picked. This clearly shows a pattern which even the most basic of the branch predictors can discern. which means your code will practically execute as if condition is not there, because after a first few iterations the branch predictor will be able to pick the path with 100% accuracy.
To know more read this great so thread
Fun fact: this particular optimization is the reason for CPU vulnerabilities such as specter and meltdown
Upvotes: 5
Reputation: 1587
I could write two versions of myAlgorithm...but that would lead to lots of code duplication"
Yes, this may be a way to optimize performance and one of rare cases when DRY doesn't work. Another example of such RY technique - the loop unrolling (if your compiler didn't do it :)). Here, the code duplication is the cost you pay for better performance.
But, as for your particular IF, you see, the condition isn't changed in the loop and CPU branch prediction should work quite well. Make good/correct performance tests (with JMH, for example) and something tells me that you will not see any difference with such pico(even not micro)-optimization, the result may be even worse, since there are much-much more important things that may affect the overall performance. Just a few of such ones:
So, don't worry about this particular IF at all.
Performance optimization is a very large and very interesting area. Take care about RIGHT things and make make make correct perf tests... And always think about appropriate algorithms/collections, remember about classical big O.
Upvotes: 0
Reputation: 46462
lots of code duplication
Lots of code duplication means there's lots of code. So how could one simple null check influence the performance?
Hoisting null checks out of loops is a very trivial optimization. There's no guarantee that it gets done, but when the JIT can't do it, the CPU can still perfectly predict the branch outcome.(*) So the branch will cost something like ¼ cycle as current CPU's are capable of executing say 4 instruction per cycle.
As already said, there's some branching anyway as the JIT has to do the null check. So most probably the net win from this premature optimization is zero.
(*) The prediction can get spoiled by tons of other branches evicting you branch from the predictor (sort of cache). But then your poor branch was just one of many and you needn't care.
Upvotes: 0