Tomas
Tomas

Reputation: 71

AVX512 duplicate low 256 bits into high 256 bits inside a zmm register

Is there a faster way to duplicate (copy) the low 256 bits of an AVX-512 register into the higher 256 bits than using the _mm512_insertf64x4 instruction?

My current solution is:

__m512d zmm1 = _mm512_load_pd(mem);
zmm1 = _mm512_insertf64x4(zmm1,zmm1,1);

Or, equivalently, is there a faster way to load 256 bits (4 doubles) from memory and store them in both low and high 256 bit lanes of a 512-bit zmm register?

Upvotes: 1

Views: 81

Answers (0)

Related Questions