Reputation: 1769
I am able to find out the logic from: Here
r = y ^ ((x ^ y) & -(x < y)); // min(x, y)
r = x ^ ((x ^ y) & -(x < y)); // max(x, y)
It says it is faster then doing
r = (x < y) ? x : y
Can someone explain a bit more about it to understand it with example. How it could be faster?
Upvotes: 4
Views: 3983
Reputation: 516
Using bit manipulation :
void func(int a,int b){
int c = a - b;
int k = (c >> 31) & 0x1;
int max = a - k * c;
int min = b + k * c;
printf("max = %d\nmin = %d",max,min);
}
Upvotes: 2
Reputation: 2306
The question does not specify the hardware this will run on. My answer will address the case where this is running on x86 (for instance any PC). Lets look at the assembly generated by each.
; r = y ^ ((x ^ y) & -(x < y))
xor edx,edx
cmp ebx,eax
mov ecx,eax
setl dl
xor ecx,ebx
neg edx
and edx,ecx
xor eax,edx
; r = (x < y) ? x : y
cmp ebx,eax
cmovl eax,ebx
The XOR version has to zero registers and move values around on top of the operations it inherently needs to perform, adding up to 8 instructions. However x86 has a cmov or conditional move instruction. So the ?:
version compiles to a comparison and a cmovl
, just 2 instructions. However this doesn't necessary make the ?:
version 4 times faster since different instructions may have different latencies, and different dependency chains. But you can certainly see how ?:
will very likely be faster than the XOR version.
It's also worth noting that neither version requires a branch, and so there is no branch misprediction penalty.
Upvotes: 1
Reputation:
The ?
risks to be implemented with a conditional branch (instead of a conditional assignment).
Conditional branching is a small "catastrophy" for a processor, as it cannot guess what instruction will be fetched later. This breaks the pipeline organization of the ALU (several instructions being in progress concurrently to increase throughput), and causes pipeline re-initialization delays. To alleviate this, processors resort to branch prediction, i.e. they bet on the branch that will be taken, but they can't be successful all the time.
In conclusion: conditional branches can be slloooowwwwwwww...
Upvotes: 0
Reputation: 5512
In the link that you provided, it is explicitly stated:
On some rare machines where branching is very expensive and no condition move instructions exist, the [code] might be faster than the obvious approach, r = (x < y) ? x : y
Later on, it says:
On some machines, evaluating (x < y) as 0 or 1 requires a branch instruction, so there may be no advantage.
In short, the bit manipulation solution is only faster on machines that have poor branch execution, as it operates solely on the numerical values of the operands. On most machines the branching approach is just as fast (and sometimes even faster) and should be preferred for its readability.
Upvotes: 6
Reputation: 214495
Discussing optimization without a specific hardware in mind doesn't make any sense. You really can't tell which alternative that is fastest without going into details of a specific system. Boldly making a statement about the first alternative being fastest without any specific hardware in mind, is just pre-mature optimization.
The obscure xor solution might be faster than the comparison alternative if the given CPU's performance relies heavily on branch prediction. In other words, if it executes regular instructions such as arithmetic ones very fast, but gets a performance bottleneck at any conditional statement (such as an if
), where the code might branch on several ways. Other factors such as the amount instruction cache memory etc. also matter.
Many CPUs will however execute the second alternative much faster, because it involves fewer operations.
So to sum it up, you'll have to be an expert of the given CPU to actually tell in theory which code that will be the fastest. If you aren't such an expert, simply benchmark it and see. Or look at the disassembly for notable differences.
Upvotes: 11