Reputation: 680
I have recently picked up an interest in SIMD optimization after wanting to program again in C++ after a while of not doing so. Please, be descriptive as I am still a beginner with SIMD instructions.
My question is : is it possible to compile one cross-platform executable in C++ that support a variety of SIMD instruction sets and that picks in real time the best instruction set to use? Best in terms of performance, usually most recent instruction sets are better.
Example : I compile a game on Windows 10 with an i7-7700K and put it on Steam. Distinct users highly probably have different CPUs that support different SIMD instruction sets. When launching the game, the best SIMD instruction set is detected and used.
Naturally, I would have to adapt my code and support a few hand selected SIMD instruction sets.
Upvotes: 1
Views: 881
Reputation: 41057
Generally the issue is what level of granularity you want to use SIMD... Older math libraries like D3DXMath use indirect jump (i.e. virtual methods) to select at runtime a version of the function that is optimized for that instruction set. While this works in theory, the function has to do enough work to cover the overhead of the indirect call.
For example: If you call D3DXVec3Dot
and it selects a different version for SSE/SSE2, SSE3, or SSE4.1 most likely the cost of calling the function in the first place is more expensive that the performance savings. To really get a benefit from this kind of optimization, you need to have larger scale routines that do thousands of computations at once rather than micro-functions.
Note that this is why DirectXMath is an all inline library that doesn't use indirect jump/dispatch at all. You can count on SSE/SSE2 always being supported for x64, and it's basically always supported for x86. If you happen to be building an EXE/DLL for a platform that always has AVX (such as Xbox One), then use
/arch:AVX
and the DirectXMath library will use AVX, SSE4.1, SSE3, SSE2/SSE where it makes sense. See this blog post series.
Upvotes: 3