kapilddit
kapilddit

Reputation: 1769

Find out max & min of two number without using If else?

I am able to find out the logic from: Here

r = y ^ ((x ^ y) & -(x < y)); // min(x, y)
r = x ^ ((x ^ y) & -(x < y)); // max(x, y)

It says it is faster then doing

r = (x < y) ? x : y

Can someone explain a bit more about it to understand it with example. How it could be faster?

Upvotes: 4

Views: 3983

Answers (5)

user3250183
user3250183

Reputation: 516

Using bit manipulation :

void func(int a,int b){
    int c = a - b;
    int k = (c >> 31) & 0x1;
    int max = a - k * c;
    int min = b + k * c;
    printf("max = %d\nmin = %d",max,min);
    }

Upvotes: 2

Apriori
Apriori

Reputation: 2306

The question does not specify the hardware this will run on. My answer will address the case where this is running on x86 (for instance any PC). Lets look at the assembly generated by each.

; r = y ^ ((x ^ y) & -(x < y))
xor    edx,edx
cmp    ebx,eax
mov    ecx,eax
setl   dl
xor    ecx,ebx
neg    edx
and    edx,ecx  
xor    eax,edx  

; r = (x < y) ? x : y
cmp    ebx,eax  
cmovl  eax,ebx  

The XOR version has to zero registers and move values around on top of the operations it inherently needs to perform, adding up to 8 instructions. However x86 has a cmov or conditional move instruction. So the ?: version compiles to a comparison and a cmovl, just 2 instructions. However this doesn't necessary make the ?: version 4 times faster since different instructions may have different latencies, and different dependency chains. But you can certainly see how ?: will very likely be faster than the XOR version.

It's also worth noting that neither version requires a branch, and so there is no branch misprediction penalty.

Upvotes: 1

user1196549
user1196549

Reputation:

The ? risks to be implemented with a conditional branch (instead of a conditional assignment).

Conditional branching is a small "catastrophy" for a processor, as it cannot guess what instruction will be fetched later. This breaks the pipeline organization of the ALU (several instructions being in progress concurrently to increase throughput), and causes pipeline re-initialization delays. To alleviate this, processors resort to branch prediction, i.e. they bet on the branch that will be taken, but they can't be successful all the time.

In conclusion: conditional branches can be slloooowwwwwwww...

Upvotes: 0

Daniel Kleinstein
Daniel Kleinstein

Reputation: 5512

In the link that you provided, it is explicitly stated:

On some rare machines where branching is very expensive and no condition move instructions exist, the [code] might be faster than the obvious approach, r = (x < y) ? x : y

Later on, it says:

On some machines, evaluating (x < y) as 0 or 1 requires a branch instruction, so there may be no advantage.

In short, the bit manipulation solution is only faster on machines that have poor branch execution, as it operates solely on the numerical values of the operands. On most machines the branching approach is just as fast (and sometimes even faster) and should be preferred for its readability.

Upvotes: 6

Lundin
Lundin

Reputation: 214495

Discussing optimization without a specific hardware in mind doesn't make any sense. You really can't tell which alternative that is fastest without going into details of a specific system. Boldly making a statement about the first alternative being fastest without any specific hardware in mind, is just pre-mature optimization.

The obscure xor solution might be faster than the comparison alternative if the given CPU's performance relies heavily on branch prediction. In other words, if it executes regular instructions such as arithmetic ones very fast, but gets a performance bottleneck at any conditional statement (such as an if), where the code might branch on several ways. Other factors such as the amount instruction cache memory etc. also matter.

Many CPUs will however execute the second alternative much faster, because it involves fewer operations.

So to sum it up, you'll have to be an expert of the given CPU to actually tell in theory which code that will be the fastest. If you aren't such an expert, simply benchmark it and see. Or look at the disassembly for notable differences.

Upvotes: 11

Related Questions