Philippe
Philippe

Reputation: 670

vectorization : multiply _m256i elements

I'm looking to multiply all 32 bits integers in a register at once using SIMD instructions, this is what I tried so far :

  int32_t a [8] = {1, 2, 3, 4, 5, 6, 7, 8};
  int32_t b [8] = {1, 2, 3, 4, 5, 6, 7, 8};
  __m256i tmp1 = _mm256_loadu_si256((__m256i*) a);
  __m256i tmp2 = _mm256_loadu_si256((__m256i*) b);

  __m256 tmp3 = _mm256_mul_epi32(tmp1,tmp2);

sadly it doesn't yield correct result, this is basically what I get : 1, 0, 9, 0, 25, 0, 49, 0

I haven't found an alternative instruction yet, any help would be appreciated.

Upvotes: 1

Views: 1178

Answers (1)

rafix07
rafix07

Reputation: 20959

If you multiply 32 bit integer using _mm256_mul_epi32, you will get 64 bit output. _mm256_mul_epi32 works as follows

a[0] * b[0] = tmp3[1:0]    1 * 1 = 1
a[2] * b[2] = tmp3[3:2]    3 * 3 = 9
a[4] * b[4] = tmp3[5:4]    5 * 5 = 25
a[6] * b[6] = tmp3[7:6]    7 * 7 = 49

in tmp3 you have 4 results.

You can try to use _mm256_mullo_epi32, this instruction multiplies each element of a array by corresponding element of b array, but only 32 low bits of 64 bit output are stored as result.

Upvotes: 4

Related Questions