anup
anup

Reputation: 539

Which one is faster?

I am using SSE2 in gcc 4.4.3. In my program, I need to use say least (0 - 7) 8-bits of a 128-bit SIMD register. Please suggest a way in which I can retrieve the 8-bits quickly.

I tried with _mm_movepi64_pi64 or _mm_extract_epi16, both of which gives similar performance in my program. I was trying with union approach also. union{__m128i a1, int a2[4]}. Though, in the test case, it gave good results, in my program, this approach was not very good.

Any ideas.. (which of the above mentioned three ways I should use?)

Upvotes: 1

Views: 618

Answers (1)

Peter Cordes
Peter Cordes

Reputation: 364009

_mm_movepi64_pi64 moves from XMM to MMX registers. There's no way it's the right choice, unless you want to do some more SIMD in MMX registers, and your code runs out of XMM regs.

If you want the bits as an array index or something, they have to be in a GP register, in which case you want SSE4.1 _mm_extract_epi8.

If you need to stick to SSE2, this should be the fastests way to get byte 5 of xmm0:

pextrw eax, xmm0, 2
movzx eax, ah

So this should hopefully get the compiler to be efficient like that:

(uint8_t)(_mm_extract_epi16(var, n/2) >> ((n%2) * 8))

Less efficient would be a shift-by-bytes _mm_bsrli_si128 (psrldq) to put the byte you want into the low byte of an xmm reg, then movd (_mm_extract_epi16(var, 0) emits movd, not pextrw r32, xmm, 0, fortunately). This way you don't have to do anything extra if the byte you want is an odd-numbered byte that pextw would leave in the high 8 of a result. Still no easy way to use this with an index that isn't a compile-time constant.

Storing 16B to memory and loading the element you want should be fairly good. (What you'll probably get with the union approach, unless the compiler optimizes it to a pextract instruction). The compiler will use a 16B-aligned location on the stack. Thus store->load forwarding should work fine in this case, so the latency will be low. If you need two separate elements into two separate integer variables, this is probably the best choice, maybe beating multiple pextrw

Upvotes: 1

Related Questions