Reputation: 1891
Is there a performance difference between cascading if-else statements like
if (i > c20) {
// ...
} else if (i > c19) {
// ...
} else if (i > c18) {
// ...
} else if (i > c17) {
// ...
} else if (i > c16) {
// ...
} else if (i > c15) {
// ...
} else if (i > c14) {
// ...
} else if (i > c13) {
// ...
} else if (i > c12) {
// ...
} else if (i > c11) {
// ...
} else if (i > c10) {
// ...
} else if (i > c9) {
// ...
} else if (i > c8) {
// ...
} else if (i > c7) {
// ...
} else if (i > c6) {
// ...
} else if (i > c5) {
// ...
} else if (i > c4) {
// ...
} else if (i > c3) {
// ...
} else if (i > c2) {
// ...
} else if (i > c1) {
// ...
} else if (i > c0) {
// ...
} else {
// ...
}
and nested if statements like:
if (i > c10) {
if (i > c15) {
if (i > c18) {
if (i > c19) {
if (i > c20) {
// ...
} else {
// ...
}
} else {
//...
}
} else {
if (i > c17) {
// ...
} else {
// ...
}
}
} else {
if (i > c13) {
if (i > c14) {
// ...
} else {
// ...
}
} else {
if (i > c12) {
// ...
} else {
// ...
}
}
}
} else {
if (i > c5) {
if (i > c8) {
if (i > c9) {
//...
} else {
//...
}
} else {
if (i > c7) {
// ...
} else {
// ...
}
}
} else {
if (i > c3) {
if (i > c4) {
// ...
} else {
// ...
}
} else {
if (i > c2) {
// ...
} else {
if (i > c0) {
if (i > c1) {
// ...
}
} else {
// ...
}
}
}
}
}
If there is a difference what is the reason one is faster than the other? Can one form result in: better JIT compilation, better cache strategy, better branch prediction, better compiler optimisation, etc.? I am particularly interested in the performance in Java
but would be interested in knowing who it might be similar or different in other languages like C/C++, C#, etc.
How would different distributions of i
, the ranges checked and/or a different number of if
statements affect the results?
Here values c0
to c20
are stricly increasing order, hence creating rages. E.g.:
c0 = 0;
c1 = 10;
c2 = 20;
c3 = 30;
c4 = 40;
c5 = 50;
c6 = 60;
c7 = 70;
c8 = 80;
c9 = 90;
c10 = 100;
c11 = 110;
c12 = 120;
c13 = 130;
c14 = 140;
c15 = 150;
c16 = 160;
c17 = 170;
c18 = 180;
c19 = 190;
c20 = 200;
or
c0 = 0;
c1 = 1;
c2 = 2;
c3 = 3;
c4 = 4;
c5 = 5;
c6 = 6;
c7 = 7;
c8 = 8;
c9 = 9;
c10 = 10;
c11 = 11;
c12 = 12;
c13 = 13;
c14 = 14;
c15 = 15;
c16 = 16;
c17 = 17;
c18 = 18;
c19 = 19;
c20 = 20;
Upvotes: 1
Views: 966
Reputation: 748
For the cascading if-else, all the expressions gets evaluated from top to bottom in the order as they appear, up to the if-else condition that evaluates to true
.
And the performance depends on number of expressions to evaluate before a true
for a condition is obtained.
Taking the example here in your case:
if (i > 0) {
// ...
} else if (i > 1) {
// ...
} else if (i > 2) {
// ...
} else if (i > 3) {
// ...
} else if (i > 4) {
For any integer input i
, if its value is less than or equal to 0
, the whole block has to execute, whereas i greater than 0
will exit after evaluation of the first if
itself.
For the next example, it is equivalent and more readable if written using short circuit operator.
if (i > 10) {
if (i > 15) {
if (i > 18) {
if (i > 19) {
if (i > 20) {
Which is better written using &&
as :
if (i>10 && i>15 && i>18 && i>19 && i>20){ // note i >10 && i< upperbound instead
}
Collapsible if statements should be merged
Here i > 10
, will result in true
and will result in evaluations of the rest of &&
conditions to the right.
So this would be better performing for inputs that results in the first expression itself evaluating to false, that is when input i
is an integer satisfying the condition i <= 10
.
Also like other answers suggest try benchmarking the code, using jmh for instance, which will give better insights on how the code perform after optimizations.
Upvotes: 0
Reputation: 102812
There are two different factors that play a major role in performance.
Forget about the vagaries of how CPUs work for a moment. Imagine this code:
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
for (int k = 0; k < n; k++) {
doSomething();
}
}
}
Performance of doSomething
can and probably will vary wildly even if it is a very simple job (hey, hotspot compilation is a thing!), but even so, if n
is 10, this runs doSomething
a thousand times. If n
is 11, then doSomething
is run 1331 times. If we make a chart with n
on the x axis and on the y axis 'how often is doSomething invoked', it'll look like y = x^3
.
And that is a chart that grows very fast. Even if doSomething
is fast, or has highly irregular performance, for some sufficiently large value of n
, performance is going to be horrible simply because of the sheer amount of looping we do here.
The key words are for some sufficiently large value. It means that eventually the core performance characteristic of this code (O(n^3)
) will be the only relevant factor.
But a CPU can loop 1331 times in an instant, and the fact that it's 3 loops deep is going to pale in comparison to what doSomething
does if n
is small. When does 'n' get so large, the sheer amount of loops you do ensures this is going to be slow no matter what happens in doSomething'? That depends on your CPU. But it will eventually happen.
Let me put it differently: If we make a different chart, which charts x-axis = n, and y-axis = milliseconds taken by dedicated CPU to chug through the calculation, then the chat will look like a mess initially, as a random interruption by your winamp or whatnot causes weird peaks, and a cache miss causes another peak somewhere else, but as the chart goes to the right, it starts turning into y= x^3
if only you go far enough to the right.
CPUs also aren't magical voodoo machines that can invent more efficient algorithms. If you write an algorithm that has the O(n^3)
characteristic, the compiler, hotspot, and CPU pipeliner are highly unlikely to just make that 'long tail of performance horror' go away except in trivial cases (where doSomething does literal nothing and hotspot knows it, so optimizes by removing the entire call and thereby the entire loop structure).
But what happens when we don't look at the tail, but look at the start? Then it's anyone's guess. CPUs have pipeliners which make it far harder to predict. The days where you can say: "This instruction takes 8 CPU cycles, it's a 4GHz CPU, so that should cost 2 billionth of a second to run it, on the dot" are long gone - CPUs haven't worked that way for decades by now: They are parsing one opcode whilst running another, and there are multiple cores involved.
Furthermore, these days, for the CPU to access main memory it takes hundreds of cycles, and in practice they can't even do it anymore: They can only operate on cached pages, and if you try to read or write to memory, the instruction is translated to the right entry in the cache. If the page isn't cached, the CPU will freeze whilst the infrastructure around the CPU finds a page to clear out, and loads in the entire page from main memory into the CPU cache (which have a hierarchy of layers to complicate matters), and the CPU is waiting for ages.
Given that operations done in completely different code and maybe even different threads can cause your cache pages to get flushed, it is obviously completely impossible to try to guess at how many nanoseconds something takes.
It also means if you take more memory than is needed, your code might not run any slower, but because it causes another page to get evicted from cache, code elsewhere now does. Is it even fair to 'blame' this code for a performance hit if trying to time this code never seems to result in slower execution?
Complicated. Very complicated.
You can use JMH to benchmark code. It tries to smear out the random noise by running the code very very often, and long enough that hotspot has kicked in (hotspot will notice a method is run a lot, and will analyse it and rewrite it to highly optimized machine code. Because usually VMs spend 99% of the time in less than 1% of the loaded methods, and that analysis and rewrite is a very pricey process, java doesn't do it for all code, only for code that is run a lot, so you have to wait for that to kick in for a true measurement).
Given that we're talking, in the example, about 20 if statements, you're never going to notice. We're waaaaaayy on the 'left' side of that performance curve, where the algorithmic performance is irrelevant and it's all about cache misses, the vagaries of the CPU pipeline, etc. Generally, you can consider any code that is less than 1000 instructions and which causes no cache misses to be instantaneous. CPUs are very fast, the bottleneck is more that they have to wait for other equipment in your hardware to catch up, and for those dreaded cache misses.
Nevertheless, as a principle? The first example needs to perform n checks, whereas the second only performs log(n) checks. log(n) is generally known as 'as good as instant', but because even if n is a million, log(n) would only be 20.
In other words, none of this stuff matters unless you have literally thousands of if
statements, and then it might, maybe, possibly be noticable, but only if you run that method a heck of a lot.
More generally, you don't even need log(n) performance, you can go with log(1) by jumping straight to the right location. THis is exactly what switch
statements do.
Thus, to conclude:
switch
statement.O(x^2)
or worse, generally), and you expect that you'll end up with sufficiently large inputs that it will matter. Then, yeah, prematurely optimize and invent better algorithms. If you are trying to decision-tree your way through a million different options, definitely use a hashmap, and don't loop through a million-large list.Upvotes: 0
Reputation: 79005
First, let's analyse your first code snippet
if (i > 0) {
// ...
} else if (i > 1) {
// ...
} else if (i > 2) {
//...
None of these else if
conditions makes sense as,
if
i
is not greater than0
, it is either equal to or less than0
and therefore yourelse if
conditions should have these values (i.e. equal to or less than0
).
Now, let's analyse your second code snippet:
if (i > 10) {
if (i > 15) {
if (i > 18) {
//...
As you can see if
i
is greater than10
, the secondif
condition will definitely be evaluated and if the second one is true, the third one will definitely be evaluated.
Thus, you are comparing apples and oranges.
If the branching is the same, there should not be any difference in performance.
Upvotes: 1