Reputation: 2178
I'm trying to understand the SSE strstr implementation, and one particular function is doing something I don't quite understand wrt loading a const unsigned char*
into an __m128i
. The function is the __m128i_strloadu
function (taken from here: http://strstrsse.googlecode.com/svn-history/r135/trunk/strmatch/lib/strstrsse42.c):
static inline __m128i __m128i_strloadu (const unsigned char * p) {
int offset = ((size_t) p & (16 - 1));
if (offset && (int) ((size_t) p & 0xfff) > 0xff0) {
__m128i a = _mm_load_si128 ((__m128i *) (p - offset));
__m128i zero = _mm_setzero_si128 ();
// I don't understand what this movemask, in concert
// with the shift right comparison below, are accomplishing
int bmsk = _mm_movemask_epi8 (_mm_cmpeq_epi8 (a, zero));
if ((bmsk >> offset) != 0) {
return __m128i_shift_right(a, offset);
}
}
return _mm_loadu_si128 ((__m128i *) p);
}
I feel like this is a simple align to 16 bits operation taking place, but I'm having trouble visualizing /how/ it's happening. What does the movemask comparison accomplish here / what is it checking for?
Upvotes: 2
Views: 513
Reputation: 64904
It's testing whether the end of the string is in this block, and if so, it shifts out the extra bytes and returns that. Otherwise it goes ahead and does the normal unaligned load, avoiding the shift and containing "more of this string" instead of spurious zeroes.
The mask is a mask of which bytes in the 16-byte block are zero. bmsk >> offset
is the part of the mask that represents bytes that were asked for (starting from p
), the extra bytes are due to alignment.
Upvotes: 1