Reputation: 151
I am using the BRIEF descriptor in OpenCV in Visual C++ 2010 to match points in two images.
In the paper about the BRIEF-descriptor is written that it is possible to speed up things:
"The BRIEF descriptor uses hamming distance, which can be done extremely fast on modern CPUs that often provide a specific instruction to perform a XOR or bit count operation, as is the case in the latest SSE instruction set."
With SSE4.2 enabled it should be speeded up. My questions is simply how I do this in Visual C++?
An alternative way could be to choose another compiler supporting SSE4. For instance Intel's ICC. Is this really necessary?
Upvotes: 5
Views: 6942
Reputation: 141
It seems like Visual Studio 17.11.5 (and toolset 14.41) added /arch:SSE4.2
.
Not very many SSE 4.2 instructions are used, on par with the old undocumented /d2archSSE42
, but it's also getting documented: https://learn.microsoft.com/en-us/cpp/build/reference/arch-x64?view=msvc-170
Upvotes: 2
Reputation: 13689
You can pass /arch:
options in undocumented way as /d2...
options. Like /d2archAVX
.
/d2archSSE42
is accepted this way. It is the only possible option not available via the documented /arch:
Upvotes: 1
Reputation: 48012
The MSVC compiler has an /arch
option for specifying the minimum architecture you want your program to target. Setting it like /arch:SSE2
will tell the compiler to assume that the CPU supports the SSE2 instructions, and it will automatically use them whenever the optimizer determines it's appropriate.
However, MSVC has no /arch:SSE4
or /arch:SSE42
option. A peek into the standard library implementation suggests that /arch:AVX
or /arch:AVX2
also implies SSE4.2. For example, the MSVC implementation of the C++20 library function std::popcount
will do a runtime check of the processor to see if it can use the SSE4.2 popcnt instruction. But if you target AVX, it skips the runtime check and just assumes the processor supports it.
I think gcc and clang do have specific options for enabling SSE4 and SSE4.2. Update: Peter Cordes confirms in the comments: "To enable popcnt specifically, -mpopcnt, or for SSE4.2 -msse4.2 which implies popcnt."
You can also use intrinsic functions for built-in instructions if you don't want to rely on the optimizer and the library implementation to find the optimal instructions.
Upvotes: 3
Reputation: 20056
Unfortunately, it doesn't work like that.
The C/C++ compiler may be told to use a specific instruction set in project-> C/C++ -> Code generation->Enable enhanced instruction set. But it does almost nothing, and in your case, absolutely nothing. That's because some CPU instructions cannot be easily accessed from C statements. Some compilers (like Intel's) are better at this than others, but for what you want to achieve, no compiler is smart enough.
What you have to do is to find the specific algorithm, learn the SSE instructions and rewrite the algorithm with those instructions manually. You can write in pure assembly, or use intrinsic functions, which can be called from C/C++, and will issue SSE instructions when compiled.
Upvotes: 6