Reputation: 3140
I'm trying to write a SWAR compare-for-equality operation, working on uint64_t
pretending to be 8 'lanes' of uint8_t
. The closest I've managed to achieve, based on techniques in Hacker's Delight and Bit Twiddling Hacks, is the following:
uint64_t compare_eq (uint64_t x, uint64_t y) {
uint64_t xored = x ^ y;
uint64_t mask = 0x7F * 0x0101010101010101ULL;
uint64_t tmp = (xored & mask) + mask;
return ~(tmp | xored | mask);
}
However, this puts 0x80
into 'lanes' which match, and 0x00
into 'lanes' that don't, whereas I want 0xFF
in 'lanes' that match, and 0x00
in 'lanes' that don't. Is it possible to write this without branching?
Upvotes: 1
Views: 369
Reputation: 5040
For the record, this is just a variant for calculating a high bit in nonzero bytes (one instruction less) put together with the comments from @njuffa and @Nate Eldredge (probably slightly more efficient than in 4386427's answer).
uint64_t compare_eq (uint64_t x, uint64_t y) {
uint64_t xored = x ^ y;
uint64_t mask = ((((xored >> 1) | 0x8080808080808080) - xored) & 0x8080808080808080);
return (mask << 1) - (mask >> 7);
}
Upvotes: 3
Reputation: 44329
To start with there is a bug (a typo?) in the posted code:
uint64_t mask = 0x7F * 0x0101010101010101ULL;
^^
Missing 0x
Once you have either 0x80 or 0x00 in the lanes, you can divide by 0x80 and multiply by 0xff.
Like:
uint64_t compare_eq (uint64_t x, uint64_t y) {
uint64_t xored = x ^ y;
uint64_t mask = 0x7F * 0x0101010101010101ULL;
uint64_t tmp = (xored & mask) + mask;
uint64_t res = ~(tmp | xored | mask);
res = res / 0x80;
res = res * 0xff;
return res;
}
Upvotes: 1