Krazer
Krazer

Reputation: 515

Compiling library with SSE2 and AVX2

Using VS2015 and compiling a library that has both SSE2 instructions and AVX2 instructions (that are only used if detected in the CPU), if I compile the library with /arch:AVX2 but only call the SSE2 instructions I get "illegal instruction" (on _mm_set1_epi32 first SSE2 instruction called). However, if I compile the lib with /arch:SSE2 it works fine when calling the SSE2 instructions.

Are the arch settings mutually exclusive? If not how should this be fixed? I have attempted both as a shared lib and static lib with the same issue.

this is the lib: https://github.com/Auburns/FastNoiseSIMD and there is an issue about it https://github.com/Auburns/FastNoiseSIMD/issues/20, although I don't think the related it directly to AVX2 being on and calling SSE2 instructions.

Upvotes: 2

Views: 1499

Answers (1)

Chuck Walbourn
Chuck Walbourn

Reputation: 41057

If you build with /arch:AVX or /arch:AVX2, the primary impact is that all SSE code generated by the compiler will use the VEX prefix encoding which allows for more efficient scheduling of registers. If you run such code on a system without AVX or AVX2 support, it will in fact fault with an illegal instruction.

In other words, your use of _mm_set1_epi32 is an SSE2 instruction, but because you built with /arch:AVX2 it emitted those instructions using the VEX prefix. The /arch switch impacts explicit intrinsics, compiler-generated floating-point math, the autovectorizer, etc.

If you want to support both 'stock' SSE/SSE2, AVX, and AVX2 platforms with optimized codepaths using the automatic generation supported by the /arch switch, you need three different binaries (EXEs or DLLs).

See this blog post as well as this one

Note the main difference between /arch:AVX and /arch:AVX2 is that the compiler will sometimes emit FMA3 instructions where the scheduler thinks it would be faster than a multiply then an add.

Upvotes: 3

Related Questions