Reputation: 43
I have seen recently that Visual Studio 2019 Preview has added an option to compile with AVX512. OK, I tried it and it worked. But why does it work while my CPU has no such capability?
I am using the following C/C++ script to detect the CPU capabilities: https://learn.microsoft.com/en-us/cpp/intrinsics/cpuid-cpuidex?view=vs-2019
All AVX512 flags (AVX512F, AVX512CD, AVX512PF and AVX512ER) are unavailable on my system when running this script.
Visual Studio 2019 Preview has the following options [AVX, AVX2, AVX512, SSE and SSE2]. AVX, AVX2, SSE and SSE2 compiled software work on my PC and that script listed above says that my PC supports all these four (AVX, AVX2, SSE and SSE2).
As you can understand now, the only problem seems to be the AVX512 capability. It works on my PC but every script I run says that I have no AVX512.
Thanks!
Upvotes: 0
Views: 2348
Reputation: 468
MSVC's compiler is a multi-versioning auto-vectorizer. As in when you specify AVX-512 code generation it will also generate AVX2, AVX, SSE, MMX, and pure scaler fallback code and a it will add a run-time check for the highest instruction set available.
See the Auto-Vectorizer Section: https://learn.microsoft.com/en-us/cpp/parallel/auto-parallelization-and-auto-vectorization?view=msvc-160
Please note that this does not happen for intrinsic functions such as:
_mm256_add_ps(float*, float*); //AVX2 floating point add
Upvotes: 0
Reputation: 365312
Presumably the compiler chose not to actually use any AVX512 instructions when auto-vectorizing. Or only in functions that don't get called in your test-cases.
Enabling AVX512 means the compiler can choose to use AVX512 instructions, not that it definitely will. If it doesn't, then it doesn't have any instructions that will fault on CPUs without AVX512.
I don't know what MSVC's default tuning options are, but using 512-bit vectors isn't always profitable, especially for programs that spend most of their time in scalar code. (Running a 512-bit uop reduces max turbo for the next few milliseconds on current Skylake-X CPUs that do support AVX512.)
For 256-bit vectors, sometimes it's useful to use an AVX512VL instruction (EVEX encoding) like combining multiple boolean ops with vpternlogd
, or one of the new shuffles like vpermt2d
. Or an EVEX encoding of an instruction available in AVX2 or earlier just to use more registers (ymm16..31) or for masked operations.
Or maybe none of your loops auto-vectorized, or maybe you didn't use an optimization level high enough to even try to auto-vectorize.
Upvotes: 4