Reputation: 313
I have written a function which multiplies four ints simultaneously in an array using SSE. The only problem is that the four ints which are being multiplied at the same time come back reversed in the array. How can I solve this? For example, if I call the function on {1,2,3,4,5,6,7,8} and multiply by 2, I get {8,6,4,2,16,14,12,10} instead of {2,4,6,8,10,12,14,16}.
int * integerMultiplication(int *a, int c, int N) {
__m128i X, Y;
X = _mm_set1_epi32(c);
for (int i=0;i<N;i+=4) {
Y = _mm_set_epi32(a[i], a[i+1], a[i+2], a[i+3]);
__m128i tmp1 = _mm_mul_epu32(X,Y); /* mul 2,0*/
__m128i tmp2 = _mm_mul_epu32( _mm_srli_si128(X,4), _mm_srli_si128(Y,4)); /* mul 3,1 */
__m128i ans = _mm_unpacklo_epi32(_mm_shuffle_epi32(tmp1, _MM_SHUFFLE (0,0,2,0)), _mm_shuffle_epi32(tmp2, _MM_SHUFFLE (0,0,2,0)));
_mm_store_si128((__m128i*)&a[i], ans);
}
return a;
}
Upvotes: 1
Views: 84
Reputation: 212979
You are initialising Y incorrectly (reverse order) and very inefficiently.
Change:
Y = _mm_set_epi32(a[i], a[i+1], a[i+2], a[i+3]);
to:
Y = _mm_load_si128(&a[i]);
Upvotes: 2