Reputation: 2096
I would like to cast unsigned int (32bit) A to unsigned short int (16bit) B in a following way:
In other words to cast A but if it is > of maximum allowed value for 16bit to set it as max value.
How can this be achieved with bit operations or other non branching method?
Upvotes: 3
Views: 2995
Reputation: 279245
Find minimum of two integers without branching:
http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax
On some rare machines where branching is very expensive and no condition move instructions exist, the above expression might be faster than the obvious approach, r = (x < y) ? x : y, even though it involves two more instructions. (Typically, the obvious approach is best, though.)
Just to kick things off, here's a brain-dead benchmark. I'm trying to get a 50/50 mix of large and small values "at random":
#include <iostream>
#include <stdint.h>
int main() {
uint32_t total = 0;
uint32_t n = 27465;
for (int i = 0; i < 1000*1000*500; ++i) {
n *= 30029; // worst PRNG in the world
uint32_t a = n & 0x1ffff;
#ifdef EMPTY
uint16_t b = a; // gives the wrong total, of course.
#endif
#ifdef NORMAL
uint16_t b = (a > 0xffff) ? 0xffff : a;
#endif
#ifdef RUSLIK
uint16_t b = (-(a >> 16) >> 16) | a;
#endif
#ifdef BITHACK
uint16_t b = a ^ ((0xffff ^ a) & -(0xffff < a));
#endif
total += b;
}
std::cout << total << "\n";
}
On my compiler (gcc 4.3.4 on cygwin with -O3), NORMAL
wins, followed by RUSLIK
, then BITHACK
, respectively 0.3, 0.5 and 0.9 seconds slower than the empty loop. Really this benchmark means nothing, I haven't even checked the emitted code to see whether the compiler's smart enough to outwit me somewhere. But I like ruslik's anyway.
Upvotes: 2
Reputation: 106127
First off, the phrase "non-branching method" doesn't technically make sense when discussing C code; the optimizer may find ways to remove branches from "branchy" C code, and conversely would be entirely within its rights to replace your clever non-branching code with a branch just to spite you (or because some heuristic said it would be faster).
That aside, the simple expression:
uint16_t b = a > UINT16_MAX ? UINT16_MAX : a;
despite "having a branch", will be compiled to some sort of (branch-free) conditional move (or possible just a saturate) by many compilers on many systems (I just tried three different compilers for ARM and Intel, and all generated a conditional move).
I would use that simple, readable expression. If and only if your compiler isn't smart enough to optimize it (or your target architecture doesn't have conditional moves), and if you have benchmark data that shows this to be a bottleneck for your program, then I would (a) find a better compiler and (b) file a bug against your compiler and only then look for clever hacks.
If you're really, truly devoted to being too clever by half, then ruslik's second suggestion is actually quite beautiful (much nicer than a generic min/max).
Upvotes: 0
Reputation: 14870
It will work for unsigned values:
b = -!!(a >> 16) | a;
or, something similar:
static inline unsigned short int fn(unsigned int a){
return (-(a >> 16) >> 16) | a;
};
Upvotes: 5
Reputation: 61508
1) With an intrinsic on a CPU that natively does this sort of convertion.
2) You're probably not going to like this, but:
c = a >> 16; /* previously declared as a short */
/* Saturate 'c' with 1s if there are any 1s, by first propagating
1s rightward, then leftward. */
c |= c >> 8;
c |= c >> 4;
c |= c >> 2;
c |= c >> 1;
c |= c << 1;
c |= c << 2;
c |= c << 4;
c |= c << 8;
b = a | c; /* implicit truncation */
Upvotes: 0