Reputation: 1091
What is the best way to multiply each 32bit entry of two _mm256i
registers with each other?
_mm256_mul_epu32
is not what I'm looking for because it produces 64bit outputs. I want a 32bit result for every 32bit input element.
Moreover, I'm sure that the multiplication of two 32bit values will not overflow.
Thanks!
Upvotes: 4
Views: 2480
Reputation: 11758
You want the _mm256_mullo_epi32()
intrinsic. From Intel's excellent online intrinsics guide:
Synopsis
__m256i _mm256_mullo_epi32 (__m256i a, __m256i b) #include "immintrin.h" Instruction: vpmulld ymm, ymm, ymm CPUID Flags: AVX2
Description
Multiply the packed 32-bit integers in a and b, producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in dst.
Upvotes: 7