Daniel Kamil Kozar
Daniel Kamil Kozar

Reputation: 19306

Is it okay to mix legacy SSE encoded instructions and VEX encoded ones in the same code path?

Along with the introduction of AVX, Intel introduced the VEX encoding scheme into the Intel 64 and IA-32 architecture. This encoding scheme is used mostly with AVX instructions. I was wondering if it's okay to intermix VEX-encoded instructions and the now called "legacy SSE" instructions.

The main reason for me asking this question is code size. Consider these two instructions :

shufps xmm0, xmm0, 0
vshufps xmm0, xmm0, xmm0, 0

I commonly use the first one to "broadcast" a scalar value to all the places in an XMM register. Now, the instruction set says that the only difference between these two (in this case) is that the VEX-encoded one clears the higher (>=128) bits of the YMM register. Supposing that I don't need that, what's the advantage of using the VEX-encoded version in this case? The first instruction takes 4 bytes (0FC6C000), the second - 5 (C5F8C6C000).

Thanks for all the answers in advance.

Upvotes: 12

Views: 1442

Answers (2)

Sujon
Sujon

Reputation: 1

It's not safe. According to Intel's software developer manual, VEX.128 version zeros the upper half of the YMM register, legacy SSE version doesn't. Worst thing: some assemblers (like gas) may convert SHUFPS into VSHUFPS while creating object file (when -mavx flag is applied). I found exact same problem working with an assembly file.

Upvotes: -1

user555045
user555045

Reputation: 64903

On current implementations, if (at least) the upper halves have been reset (VZEROUPPER or VZEROALL) there is no penalty for using legacy SSE instructions.

As detailed on page 128 in Agner Fog: optimizing subroutines in assembly, using legacy SSE instructions while (some) upper halves are in use carries a performance penalty. This penalty is incurred once when entering the state where YMM registers are split in the middle, and once again when leaving that state.

Mixing VEX-encoded 128-bit instructions and legacy SSE instructions is not a problem.

Upvotes: 12

Related Questions