Reputation: 203
I need to optimize an expression of the form:
(a > b) || (a > c)
I tried several optimized forms one of which is as follows:
(a * 2) > (b + c)
Optimization is not from the compiler's point of view. I would like to reduce the two >s to one.
This is based on the assumption that 1 <= (a, b, c) <= 26
However, this works only for some cases. Is the optimization I am trying to do, really possible? If yes, a start would be really helpful.
Upvotes: 1
Views: 949
Reputation: 11582
The best I can come up with is this
char a, b, c;
std::cin >> a >> b >> c;
if (((b-a) | (c-a)) & 0x80) {
// a > b || a > c
}
With gcc -O2
this generates only one conditional branch
40072e: 29 c8 sub %ecx,%eax
400730: 29 ca sub %ecx,%edx
400732: 09 d0 or %edx,%eax
400734: a8 80 test $0x80,%al
400736: 74 17 je 40074f <main+0x3f>
This leverages the constraints of the input values, since the values cannot be greater than 26 then subtracting a
from b
will give you a negative value when a > b
, in two's complement you know bit 7
will be set in that case - the same applies to c
. I then OR both so that bit 7
indicates whether a > b || a > c
, lastly we inspect bit 7
by AND with 0x80 and branch on that.
Update: Out of curiosity I timed 4 different ways of coding this. To generate test data I used a simple linear congruential pseudo-random number generator. I timed it in a loop for 100 million iterations. I assumed for simplicity that if the condition is true we want to add 5 to a counter, do nothing otherwise. I timed it using g++ (GCC) 4.6.3 20120306 (Red Hat 4.6.3-2)
on an Intel Xeon X5570 @ 2.93GHz
using -O2
optimization level.
Here's the code (comment out all but one of the conditional variants):
#include <iostream>
unsigned myrand() {
static unsigned x = 1;
return (x = x * 1664525 + 1013904223);
}
int main() {
size_t count = 0;
for(size_t i=0; i<100000000; ++i ) {
int a = 1 + myrand() % 26;
int b = 1 + myrand() % 26;
int c = 1 + myrand() % 26;
count += 5 & (((b-a) | (c-a)) >> 31); // 0.635 sec
//if (((b-a) | (c-a)) & 0x80) count += 5; // 0.660 sec
//if (a > std::max(b,c)) count += 5; // 0.677 sec
//if ( a > b || a > c) count += 5; // 1.164 sec
}
std::cout << count << std::endl;
return 0;
}
The fastest is a modification on the suggestion in my answer, where we use sign extension to generate a mask that is either 32 1s
or 32 0s
depending on whether the condition is true of false, and use that to mask the 5
being added so that it either adds 5 or 0. This variation has no branches. The times are in a comment on each line. The slowest was the original expression ( a > b || a > c)
.
Upvotes: 1
Reputation: 129374
I'm quite doubtful this is even an optimisation in most cases.
a > b || a > c
will evaluate to:
compare a b
jump not greater
compare a c
jump not greater
where
a * 2 > b + c
gives:
shift a left 1 (in temp1)
add b to c (in temp2)
compare temp1 temp2
jump if not greater
As always with performance, it's always much better to base your decision on actual performance measurements (preferably on a selection of processor architectures).
Upvotes: 2
Reputation: 1219
The answer is probably: you do not want to optimize that. Moreover, I doubt that there's any way to write this more efficiently. If you say that a, b and c are values between 1 and 26, you shouldn't be using integers (you don't need that precision) if you wanted to be optimal (in size) anyway.
If a > b, the expression a > c will not be executed anyway. So you have at maximum 2 (and at minimum 1) conditional operations, which is really not worth an optimization.
Upvotes: 4