Reputation: 370
I've spent too many brain cycles on this over the last day.
I'm trying to come up with a set of bitwise operations that may re-implement the following condition:
uint8_t a, b;
uint8_t c, d;
uint8_t e, f;
...
bool result = (a == 0xff || a == b) && (c == 0xff || c == d) && (e == 0xff || e == f);
Code I'm looking at has four of these expressions, short-circuit &&
ed together (as above).
I know this is an esoteric question, but the short-circuit nature of this and the timing of the above code in a tight loop makes the lack of predictable time a royal pain, and quite frankly, it seems to really suck on architectures where branch prediction isn't available, or so well implemented.
Is there such a beast that would be concise?
Upvotes: 2
Views: 2582
Reputation: 154208
To remove timing variations with &&, ||
, use &, |
. @molbdnilo. Possible faster, maybe not. Certainly easier to parallel.
// bool result = (a == 0xff || a == b) && (c == 0xff || c == d)
// && (e == 0xff || e == f);
bool result = ((a == 0xff) | (a == b)) & ((c == 0xff) | (c == d))
& ((e == 0xff) | (e == f));
Upvotes: 0
Reputation: 126488
So, if you really want to do bit-twiddling to make this "fast" (which you really should only do after profiling your code to make sure this is a bottleneck), what you want to do is vectorize this by packing all the values together into a wider word so you can do all the comparisons at once (one instruction), and then extract the answer from a few bits.
There are a few tricks to this. To compare two value for equality, you can xor (^) them and test to see if the result is zero. To test a field of a wider word to see if it is zero, you can 'pack' it with a 1 bit above, then subtract one and see if the extra bit you added is still 1 -- if it is now 0, the value of the field was zero.
Putting all this together, you want to do 6 8-bit compares at once. You can pack these values into 9 bit fields in a 64-bit word (9 bits to get that extra 1 guard bit your going to test for subtraction). You can fit up to 7 such 9 bit fields in a 64 bit int, so no problem
// pack 6 9-bit values into a word
#define VEC6x9(A,B,C,D,E,F) (((uint64_t)(A) << 45) | ((uint64_t)(B) << 36) | ((uint64_t)(C) << 27) | ((uint64_t)(D) << 18) | ((uint64_t)(E) << 9) | (uint64_t)(F))
// the two values to compare
uint64_t v1 = VEC6x9(a, a, c, c, e, e);
uint64_t v2 = VEC6x9(b, 0xff, d, 0xff, f, 0xff);
uint64_t guard_bits = VEC6x9(0x100, 0x100, 0x100, 0x100, 0x100, 0x100);
uint64_t ones = VEC6x9(1, 1, 1, 1, 1, 1);
uint64_t alt_guard_bits = VEC6x9(0, 0x100, 0, 0x100, 0, 0x100);
// do the comparisons in parallel
uint64_t res_vec = ((v1 ^ v2) | guard_bits) - ones;
// mask off the bits we'll ignore (optional for clarity, not needed for correctness)
res_vec &= ~guard_bits;
// do the 3 OR ops in parallel
res_vec &= res_vec >> 9;
// get the result
bool result = (res_vec & alt_guard_bits) == 0;
The ORs and ANDs at the end are 'backwards' becuase the result bit for each comparison is 0 if the comparison was true (values were equal) and 1 if it was false (values were not equal.)
All of the above is mostly of interest if you are writing a compiler -- its how you end up implementing a vector comparison -- and it may well be the case that a vectorizing compiler will do it all for you automatically.
This can be much more efficient if you can arrange to have your initial values pre-packed into vectors. This may in turn influence your choice of data structures and allowable values -- if you arrange for your values to be 7-bit or 15-bit (instead of 8-bit) they may pack nicer when you add the guard bits...
Upvotes: 1
Reputation: 2624
You could modify how you store and interpret the data:
When a
if 0xFF, do you need the value of b
. If not, then make b
equal to 0xFF
and simplify the expression by removing the part that test for 0xFF
.
Also, you might combine a
, b
and c
in a single variable.
uint32_t abc;
uint32_t def;
bool result = abc == def;
Other operations might be slower but that loop should be much faster (single comparison instead of up to 6 comparisons).
You might want to use an union to be able to access byte individually or in group. In that case, make sure that the forth byte is always 0.
Upvotes: 0