Reputation: 71
Is there a faster way to duplicate (copy) the low 256 bits of an AVX-512 register into the higher 256 bits than using the _mm512_insertf64x4
instruction?
My current solution is:
__m512d zmm1 = _mm512_load_pd(mem);
zmm1 = _mm512_insertf64x4(zmm1,zmm1,1);
Or, equivalently, is there a faster way to load 256 bits (4 doubles) from memory and store them in both low and high 256 bit lanes of a 512-bit zmm register?
Upvotes: 1
Views: 81