Reputation: 1856
Is it possible to use SSE for bit manipulations on data that is not byte-aligned? For example, I would like to do implement this using SSE:
const char buf[8];
assert(n <= 8);
long rv = 0;
for (int i = 0; i < n; i++)
rv = (rv << 6) | (buf[i] & 0x3f);
Instead, I would like load buf into a xmm register and use SSE instructions to avoid the loop. Unfortunately, the shift operations (such as PSLLW) shift each packed integer by the same amount, so I cannot use it here. Using multiplication (PMULLW) to emulate shifts does not seem right either...
Looking at the SSE documentation, it appears that bit manipulations are not particularly well supported in general. Is this true? Or are there nice bit-manipulation examples using SSE?
Upvotes: 1
Views: 1364
Reputation: 18227
I'm not sure SSE instructions help reduce the number of operations required to implement what your code perform here; if anyone knows, I'd be curious as well. Let's decompose the code a bit.
The code is a recursive shift / or sequence, meaning you take the lowest 6 bits, shift them left by six, or the next 6 bits in, shift again, and so on.
So you're converting an array of eight-bit values to a packed array of six-bit values you shrink things from 64bits to 48bits. Like:
|76543210|76543210|76543210|76543210|76543210|76543210|76543210|76543210| |-----------------|54321054|32105432|10543210|54321054|32105432|10543210|
You can therefore unwind the loop and write it as follows:
/*
* (buf[x] << 58)
* moves lowest six bits of a 64bit long into the highest bits, clears others
*
* >> (6 * x + 16)
* shifts the bits into the expected final position
*/
#define L(x) (((long)buf[x] << 58) >> (6 * x + 16))
long rv = L(0) | L(1) | L(2) | L(3) | L(4) | L(5) | L(6) | L(7);
As mentioned, I'm not aware of a SSE instruction that would help with this kind of packing (SSE packs do quad-to-word, word-to-short, short-to-byte).
You can perform the operations inside SSE registers, but not, as far as I can see, reduce the number of instructions required to get at the end result.
Upvotes: 4
Reputation: 6809
There are quite a few bitwise operations you can perform in SSE. You can just use _mm_and_si128, _mm_or_si128 and there is a huge set of shift-operations. Google _mm_slli_si128 to find the complete list. These instructions have been added to SSE2 so they're widely available.
Upvotes: 0