Chris Zelenak
Chris Zelenak

Reputation: 2178

Understanding bit alignment for an __m128i flag

I'm trying to understand the SSE strstr implementation, and one particular function is doing something I don't quite understand wrt loading a const unsigned char* into an __m128i. The function is the __m128i_strloadu function (taken from here: http://strstrsse.googlecode.com/svn-history/r135/trunk/strmatch/lib/strstrsse42.c):

static inline __m128i __m128i_strloadu (const unsigned char * p) {
 int offset = ((size_t) p & (16 - 1));

 if (offset && (int) ((size_t) p & 0xfff) > 0xff0) {
   __m128i a    = _mm_load_si128 ((__m128i *) (p - offset));
   __m128i zero = _mm_setzero_si128 ();
   // I don't understand what this movemask, in concert
   // with the shift right comparison below, are accomplishing
   int bmsk     = _mm_movemask_epi8 (_mm_cmpeq_epi8 (a, zero));

   if ((bmsk >> offset) != 0) {
     return __m128i_shift_right(a, offset);
   }

 }
 return _mm_loadu_si128 ((__m128i *) p);
}

I feel like this is a simple align to 16 bits operation taking place, but I'm having trouble visualizing /how/ it's happening. What does the movemask comparison accomplish here / what is it checking for?

Upvotes: 2

Views: 513

Answers (1)

user555045
user555045

Reputation: 64904

It's testing whether the end of the string is in this block, and if so, it shifts out the extra bytes and returns that. Otherwise it goes ahead and does the normal unaligned load, avoiding the shift and containing "more of this string" instead of spurious zeroes.

The mask is a mask of which bytes in the 16-byte block are zero. bmsk >> offset is the part of the mask that represents bytes that were asked for (starting from p), the extra bytes are due to alignment.

Upvotes: 1

Related Questions