bvarner
bvarner

Reputation: 370

Bitwise operations for comparing numbers?

I've spent too many brain cycles on this over the last day.

I'm trying to come up with a set of bitwise operations that may re-implement the following condition:

uint8_t a, b;
uint8_t c, d;
uint8_t e, f;
...

bool result = (a == 0xff || a == b) && (c == 0xff || c == d) && (e == 0xff || e == f);

Code I'm looking at has four of these expressions, short-circuit &&ed together (as above).

I know this is an esoteric question, but the short-circuit nature of this and the timing of the above code in a tight loop makes the lack of predictable time a royal pain, and quite frankly, it seems to really suck on architectures where branch prediction isn't available, or so well implemented.

Is there such a beast that would be concise?

Upvotes: 2

Views: 2582

Answers (3)

chux
chux

Reputation: 154208

To remove timing variations with &&, ||, use &, |. @molbdnilo. Possible faster, maybe not. Certainly easier to parallel.

// bool result = (a == 0xff || a == b) && (c == 0xff || c == d) 
//     && (e == 0xff || e == f);
bool result = ((a == 0xff) | (a == b)) & ((c == 0xff) | (c == d))
    & ((e == 0xff) | (e == f));

Upvotes: 0

Chris Dodd
Chris Dodd

Reputation: 126488

So, if you really want to do bit-twiddling to make this "fast" (which you really should only do after profiling your code to make sure this is a bottleneck), what you want to do is vectorize this by packing all the values together into a wider word so you can do all the comparisons at once (one instruction), and then extract the answer from a few bits.

There are a few tricks to this. To compare two value for equality, you can xor (^) them and test to see if the result is zero. To test a field of a wider word to see if it is zero, you can 'pack' it with a 1 bit above, then subtract one and see if the extra bit you added is still 1 -- if it is now 0, the value of the field was zero.

Putting all this together, you want to do 6 8-bit compares at once. You can pack these values into 9 bit fields in a 64-bit word (9 bits to get that extra 1 guard bit your going to test for subtraction). You can fit up to 7 such 9 bit fields in a 64 bit int, so no problem

// pack 6 9-bit values into a word
#define VEC6x9(A,B,C,D,E,F)  (((uint64_t)(A) << 45) | ((uint64_t)(B) << 36) | ((uint64_t)(C) << 27) | ((uint64_t)(D) << 18) | ((uint64_t)(E) << 9) | (uint64_t)(F))

// the two values to compare
uint64_t v1 = VEC6x9(a, a, c, c, e, e);
uint64_t v2 = VEC6x9(b, 0xff, d, 0xff, f, 0xff);
uint64_t guard_bits = VEC6x9(0x100, 0x100, 0x100, 0x100, 0x100, 0x100);
uint64_t ones = VEC6x9(1, 1, 1, 1, 1, 1);
uint64_t alt_guard_bits = VEC6x9(0, 0x100, 0, 0x100, 0, 0x100);

// do the comparisons in parallel
uint64_t res_vec = ((v1 ^ v2) | guard_bits) - ones;

// mask off the bits we'll ignore (optional for clarity, not needed for correctness)
res_vec &= ~guard_bits;

// do the 3 OR ops in parallel
res_vec &= res_vec >> 9;

// get the result
bool result = (res_vec & alt_guard_bits) == 0;

The ORs and ANDs at the end are 'backwards' becuase the result bit for each comparison is 0 if the comparison was true (values were equal) and 1 if it was false (values were not equal.)

All of the above is mostly of interest if you are writing a compiler -- its how you end up implementing a vector comparison -- and it may well be the case that a vectorizing compiler will do it all for you automatically.

This can be much more efficient if you can arrange to have your initial values pre-packed into vectors. This may in turn influence your choice of data structures and allowable values -- if you arrange for your values to be 7-bit or 15-bit (instead of 8-bit) they may pack nicer when you add the guard bits...

Upvotes: 1

Phil1970
Phil1970

Reputation: 2624

You could modify how you store and interpret the data:

When a if 0xFF, do you need the value of b. If not, then make b equal to 0xFF and simplify the expression by removing the part that test for 0xFF.

Also, you might combine a, b and c in a single variable.

uint32_t abc;
uint32_t def;

bool result = abc == def;

Other operations might be slower but that loop should be much faster (single comparison instead of up to 6 comparisons).

You might want to use an union to be able to access byte individually or in group. In that case, make sure that the forth byte is always 0.

Upvotes: 0

Related Questions