Reputation: 19306
Along with the introduction of AVX, Intel introduced the VEX encoding scheme into the Intel 64 and IA-32 architecture. This encoding scheme is used mostly with AVX instructions. I was wondering if it's okay to intermix VEX-encoded instructions and the now called "legacy SSE" instructions.
The main reason for me asking this question is code size. Consider these two instructions :
shufps xmm0, xmm0, 0
vshufps xmm0, xmm0, xmm0, 0
I commonly use the first one to "broadcast" a scalar value to all the places in an XMM register. Now, the instruction set says that the only difference between these two (in this case) is that the VEX-encoded one clears the higher (>=128) bits of the YMM register. Supposing that I don't need that, what's the advantage of using the VEX-encoded version in this case? The first instruction takes 4 bytes (0FC6C000
), the second - 5 (C5F8C6C000
).
Thanks for all the answers in advance.
Upvotes: 12
Views: 1442
Reputation: 1
It's not safe. According to Intel's software developer manual, VEX.128
version zeros the upper half of the YMM register, legacy SSE version doesn't. Worst thing: some assemblers (like gas) may convert SHUFPS
into VSHUFPS
while creating object file (when -mavx
flag is applied). I found exact same problem working with an assembly file.
Upvotes: -1
Reputation: 64903
On current implementations, if (at least) the upper halves have been reset (VZEROUPPER or VZEROALL) there is no penalty for using legacy SSE instructions.
As detailed on page 128 in Agner Fog: optimizing subroutines in assembly, using legacy SSE instructions while (some) upper halves are in use carries a performance penalty. This penalty is incurred once when entering the state where YMM registers are split in the middle, and once again when leaving that state.
Mixing VEX-encoded 128-bit instructions and legacy SSE instructions is not a problem.
Upvotes: 12