Fredrik
Fredrik

Reputation: 151

How do I enable the SSE4.2 instruction set in Visual C++?

I am using the BRIEF descriptor in OpenCV in Visual C++ 2010 to match points in two images.

In the paper about the BRIEF-descriptor is written that it is possible to speed up things:

"The BRIEF descriptor uses hamming distance, which can be done extremely fast on modern CPUs that often provide a specific instruction to perform a XOR or bit count operation, as is the case in the latest SSE instruction set."

With SSE4.2 enabled it should be speeded up. My questions is simply how I do this in Visual C++?

An alternative way could be to choose another compiler supporting SSE4. For instance Intel's ICC. Is this really necessary?

Upvotes: 5

Views: 6942

Answers (4)

Jan Ringoš
Jan Ringoš

Reputation: 141

It seems like Visual Studio 17.11.5 (and toolset 14.41) added /arch:SSE4.2.

Not very many SSE 4.2 instructions are used, on par with the old undocumented /d2archSSE42, but it's also getting documented: https://learn.microsoft.com/en-us/cpp/build/reference/arch-x64?view=msvc-170

Upvotes: 2

Alex Guteniev
Alex Guteniev

Reputation: 13689

You can pass /arch: options in undocumented way as /d2... options. Like /d2archAVX.

/d2archSSE42 is accepted this way. It is the only possible option not available via the documented /arch:

Upvotes: 1

Adrian McCarthy
Adrian McCarthy

Reputation: 48012

The MSVC compiler has an /arch option for specifying the minimum architecture you want your program to target. Setting it like /arch:SSE2 will tell the compiler to assume that the CPU supports the SSE2 instructions, and it will automatically use them whenever the optimizer determines it's appropriate.

However, MSVC has no /arch:SSE4 or /arch:SSE42 option. A peek into the standard library implementation suggests that /arch:AVX or /arch:AVX2 also implies SSE4.2. For example, the MSVC implementation of the C++20 library function std::popcount will do a runtime check of the processor to see if it can use the SSE4.2 popcnt instruction. But if you target AVX, it skips the runtime check and just assumes the processor supports it.

I think gcc and clang do have specific options for enabling SSE4 and SSE4.2. Update: Peter Cordes confirms in the comments: "To enable popcnt specifically, -mpopcnt, or for SSE4.2 -msse4.2 which implies popcnt."

You can also use intrinsic functions for built-in instructions if you don't want to rely on the optimizer and the library implementation to find the optimal instructions.

Upvotes: 3

Sam
Sam

Reputation: 20056

Unfortunately, it doesn't work like that.

The C/C++ compiler may be told to use a specific instruction set in project-> C/C++ -> Code generation->Enable enhanced instruction set. But it does almost nothing, and in your case, absolutely nothing. That's because some CPU instructions cannot be easily accessed from C statements. Some compilers (like Intel's) are better at this than others, but for what you want to achieve, no compiler is smart enough.

What you have to do is to find the specific algorithm, learn the SSE instructions and rewrite the algorithm with those instructions manually. You can write in pure assembly, or use intrinsic functions, which can be called from C/C++, and will issue SSE instructions when compiled.

Upvotes: 6

Related Questions