Reputation: 76597
I have the following code to turn a bit into a byte.
__device__ UINT64 bitToByte(const UINT8 input) {
UINT64 b = ((0x8040201008040201ULL * input) >> 7) & 0x0101010101010101ULL;
//reverse the byte order <<-- this step is missing
return b;
}
However the bytes are in the wrong order, the endianness is reversed.
On the CPU I can simply to a bswap reg,reg
to fix this, but what do I do on the GPU?
Alternatively, what similar trick can I use so that the bytes are put the right way round, i.e. the Most Significant bit goes to the Most Significant Byte, such that I don't need a bswap trick.
Upvotes: 5
Views: 522
Reputation: 11547
To reverse byte order, the bit extraction can be done with the same trick, but by swapping the coefficients that perform the shift in the multiplication. However, to avoid clashes in the multiplication, it must be done in two steps, for the even and odd bits. This way, 2 bytes are free to hold the result of the every multiplication which is sufficient to ensure integrity of the result.
__device__ UINT64 bitToByte(const UINT8 input) {
UINT64 b = ( ((0x0002000800200080ULL * input) >> 7) & 0x0001000100010001ULL)
| ( ((0x0100040010004000ULL * input) >> 7) & 0x0100010001000100ULL);
return b;
}
As spotted in the comments, to optimize, the shifts can be factorized.
__device__ UINT64 bitToByte(const UINT8 input) {
UINT64 b = ( ((0x0002000800200080ULL * input) & 0x0080008000800080ULL)
| ((0x0100040010004000ULL * input) & 0x8000800080008000ULL) )
>> 7 ;
return b;
}
Upvotes: 2
Reputation: 76597
Thanks to @tera, here is the answer:
//Expand every bit into a byte
__device__ static UINT64 Add012(const UINT8 input) {
const UINT64 b = ((0x8040201008040201ULL * input) >> 7) & 0x0101010101010101ULL; //extract every bit into a byte
//unfortunatly this returns the wrong byte order
UINT32* const b2 = (UINT32*)&b;
UINT64 Result;
UINT32* const Result2 = (UINT32*)&Result;
Result2[0] = __byte_perm(b2[0]/*LSB*/, b2[1], 0x4567); //swap the bytes around, the MSB's go into the LSB in reverse order
Result2[1] = __byte_perm(b2[0]/*LSB*/, b2[1], 0x0123); //and the LSB -> MSB reversed.
return Result;
}
The __byte_perm
replaces the bswap
instruction.
Alternatively the input can be reversed using the __brev
(bit-reverse) intrinsic:
//Expand every bit into a byte
__device__ static UINT64 Add012(const UINT8 input) {
const UINT32 reversed = (__brev(input) >> 24);
return ((0x8040201008040201ULL * reversed) >> 7) & 0x0101010101010101ULL; //extract every bit into a byte
}
The second version looks easier.
Upvotes: 3
Reputation: 1724
Instead of reverse the result, you can reverse input
, with any of the tricks explained here. For example, using the approach of this answer:
static UINT8 lookup[16] = {
0x0, 0x8, 0x4, 0xc, 0x2, 0xa, 0x6, 0xe,
0x1, 0x9, 0x5, 0xd, 0x3, 0xb, 0x7, 0xf, };
UINT8 reverse(UINT8 n) {
return (lookup[n & 0xF] << 4) | lookup[n >> 4];
}
__device__ UINT64 bitToByte(const UINT8 input) {
UINT64 b = ((0x8040201008040201ULL * reverse(input)) >> 7) & 0x0101010101010101ULL;
return b;
}
Upvotes: 2