Adam Lee
Adam Lee

Reputation: 25738

how to use SSE instruction in the x64 architecture in c++?

Currently I am using Visual C++ inline assembly to embed some core function using SSE; however I juts realised that inline assembly is not supported in x64 mode.

How can I use SSE when I build my software in x64 architecture?

Upvotes: 3

Views: 3705

Answers (1)

Z boson
Z boson

Reputation: 33659

The modern method to use assembly instructions in C/C++ is to use intrinsics. Intrinsics have several advantages over inline assembly such as:

  • You don't have to worry about 32-bit and 64-bit mode.
  • You don't need to worry about registers and register spilling.
  • No need to worry AT&T and Intel Syntax.
  • No need to worry about calling conversions.
  • The compiler can optimize intrinsics further which it won't do with inline assembly.
  • Intrinsics are compatible (for the most intrinsics) with GCC, MSVC, ICC, and Clang.

I also like intrinsics because it's easy to emulate hardware with them for example to prepare for AVX512.

You can find the list of Intrinsics MSVC supports here. Intel has better information on intrinsics as well which agrees mostly with MSVC's intrinsics.

But sometimes you still need or want inline assembly. In my opinion it's really stupid that Microsoft does not allow inline assembly in 64-bit mode. This means they have to define intrinsics for several things that other compilers can still do with inline assembly. One example is CPUID. Visual Studio has an intrinsic for CPUID but GCC still uses inline assembly. Another example is adc. For a long time MSVC had no intrinsic for adc but now it appears they do.

Additionally, because they have to create intrinsics for everything it causes confusion. They have to create an intrinsic for mulx but the Intel's documentation for this is wrong. They also have to create intrinics for adcx and adox as well but their documentation disagrees with Intel's and the generated assembly shows that no intrinsic produces adox. So once again the programmer is left waiting for an intrinsic for adox. If they had just allowed inline assembly then there would be no problem.

But back to SSE. With few exceptions, e.g. _mm_set_epi64x in 32-bit mode on MSVC (I don't know if that's been fixed) the SSE/AVX/AVX2 intrinsics work as expected with MSVC, GCC, ICC, and Clang.

Upvotes: 9

Related Questions