Reputation: 412
This is admittedly an open-ended/subjective question but I am looking for different ideas on how to "organize" multiple alternative implementations of the same functions.
I have a set of several functions that each have platform-specific implementations. Specifically, they each have a different implementation for a particular SIMD type: NEON (64-bit), NEON (128-bit), SSE3, AVX2, etc (and one non-SIMD implementation).
All functions have a non-SIMD implementation. Not all functions are specialized for each SIMD type.
Currently, I have one monolithic file that uses a mess of #ifdefs to implement the particular SIMD specializations. It worked when we were only specializing a few of the functions to one or two SIMD types. Now, it's become unwieldy.
Effectively, I need something that functions like a virtual/override. The non-SIMD implementations are implemented in a base class and SIMD specializations (if any) would override them. But I don't want actual runtime polymorphism. This code is performance critical and many of the functions can (and should) be inlined.
Something along these lines would accomplish what I need (which is still a mess of #ifdefs).
// functions.h
void function1();
void function2();
#ifdef __ARM_NEON
#include "functions_neon64.h"
#elif __SSE3__
#include "functions_sse3.h"
#endif
#include "functions_unoptimized.h"
// functions_neon64.h
#ifndef FUNCTION1_IMPL
#define FUNCTION1_IMPL
void function1() {
// NEON64 implementation
}
#endif
// functions_sse3.h
#ifndef FUNCTION2_IMPL
#define FUNCTION2_IMPL
void function2() {
// SSE3 implementation
}
#endif
// functions_unoptimized.h
#ifndef FUNCTION1_IMPL
#define FUNCTION1_IMPL
void function1() {
// Non-SIMD implementation
}
#endif
#ifndef FUNCTION2_IMPL
#define FUNCTION2_IMPL
void function2() {
// Non-SIMD implementation
}
#endif
Anyone have any better ideas?
Upvotes: 5
Views: 554
Reputation: 10700
The following are just some ideas that i came up with while thinking about it - there might be better solutions that i'm not aware of.
Using Tag-Dispatch you can define an order in which the functions should be considered by the compiler, e.g. in this case it's
AVX2 -> SSE3 -> Neon128 -> Neon64 -> None
The first implementation that's present in this chain will be used: godbolt example
/**********************************
** functions.h *******************
*********************************/
struct SIMD_None_t {};
struct SIMD_Neon64_t : SIMD_None_t {};
struct SIMD_Neon128_t : SIMD_Neon64_t {};
struct SIMD_SSE3_t : SIMD_Neon128_t {};
struct SIMD_AVX2_t : SIMD_SSE3_t {};
struct SIMD_Any_t : SIMD_AVX2_t {};
#include "functions_unoptimized.h"
#ifdef __ARM_NEON
#include "functions_neon64.h"
#endif
#ifdef __SSE3__
#include "functions_see3.h"
#endif
// etc...
#include "functions_stubs.h"
/**********************************
** functions_unoptimized.h *******
*********************************/
inline int add(int a, int b, SIMD_None_t) {
std::cout << "NONE" << std::endl;
return a + b;
}
/**********************************
** functions_neon64.h ************
*********************************/
inline int add(int a, int b, SIMD_Neon64_t) {
std::cout << "NEON!" << std::endl;
return a + b;
}
/**********************************
** functions_neon128.h ***********
*********************************/
inline int add(int a, int b, SIMD_Neon128_t) {
std::cout << "NEON128!" << std::endl;
return a + b;
}
/**********************************
** functions_stubs.h *************
*********************************/
inline int add(int a, int b) {
return add(a, b, SIMD_Any_t{});
}
/**********************************
** main.cpp **********************
*********************************/
#include "functions.h"
int main() {
add(1, 2);
}
This would output NEON128!
, since that's the best match in this case.
Upsides:
#ifdef
's needed in the implementation header filesDownsides:
, SIMD_Any_t{}
everywhere you call the function, but that's a lot of work)e.g.:
struct None { inline static int add(int a, int b) { return a + b; } };
struct Neon64 : None { inline static int add(int a, int b) { return a + b; } };
struct Neon128 : Neon64 {};
struct SIMD : Neon128 {};
// Usage:
int r = SIMD::add(1, 2);
Because child classes can hide members of their base-classes this is not ambiguos. (always the most-derived class that implements the given method is the one that will be called, so you can order your implementations)
For your example it could look like this: godbolt example
#include <iostream>
/**********************************
** functions.h *******************
*********************************/
#include "functions_unoptimized.h"
#ifdef __ARM_NEON
#include "functions_neon64.h"
#else
struct SIMD_Neon64 : SIMD_None {};
#endif
#ifdef __ARM_NEON_128
#include "functions_neon128.h"
#else
struct SIMD_Neon128 : SIMD_Neon64 {};
#endif
// etc...
struct SIMD : SIMD_Neon128 {};
/**********************************
** functions_unoptimized.h *******
*********************************/
struct SIMD_None {
inline static int sub(int a, int b) {
std::cout << "NONE" << std::endl;
return a - b;
}
};
/**********************************
** functions_neon64.h ************
*********************************/
struct SIMD_Neon64 : SIMD_None {
inline static int sub(int a, int b) {
std::cout << "Neon64" << std::endl;
return a - b;
}
};
/**********************************
** functions_neon128.h ***********
*********************************/
struct SIMD_Neon128 : SIMD_Neon64 {
inline static int sub(int a, int b) {
std::cout << "Neon128" << std::endl;
return a - b;
}
};
/**********************************
** main.cpp **********************
*********************************/
#include "functions.h"
int main() {
SIMD::sub(2, 3);
}
This would output Neon128
.
Upsides:
#ifdef
's needed in the implementation header filesDownsides:
SIMD::
If you have an enum of all possible SIMD implementations, e.g.:
enum class SIMD_Type {
Min, // Dummy Value -> No Implementation found
None,
Neon64,
Neon128,
SSE3,
AVX2,
Max // Dummy Value -> Search downwards from here
};
You can use it to (recursively) walk through them until you find one that has been specialized, e.g:
template<SIMD_Type type = SIMD_Type::Max>
inline int add(int a, int b) {
constexpr SIMD_Type nextType = static_cast<SIMD_Type>(static_cast<int>(type) - 1);
return add<nextType>(a, b);
}
template<>
inline int add<SIMD_Type::Neon64>(int a, int b) {
std::cout << "NEON!" << std::endl;
return a + b;
}
Here a call to add(1, 2)
would first call add<SIMD_Type::Max>
, which in turn would call add<SIMD_Type::AVX2
, add<SIMD_Type::SSE3>
, add<SIMD_Type::Neon128>
, and then the call to add<SIMD_Type::Neon64>
would call the specialization so recursion stops here.
If you want to make this a bit more safer (to prevent long template instaciation chains) you can additionally add one specialization for each function that stops recursion if it fails to find any specialization, e.g.: godbolt example
template<>
inline int add<SIMD_Type::Min>(int a, int b) {
static_assert(SIMD_Type::Min == SIMD_Type::Min, "No implementation found!");
return {};
}
In your case it could look like this:
#include <iostream>
/**********************************
** functions.h *******************
*********************************/
enum class SIMD_Type {
Min, // Dummy Value -> No Implementation found
None,
Neon64,
Neon128,
SSE3,
AVX2,
Max // Dummy Value -> Search downwards from here
};
#include "functions_stubs.h"
#include "functions_unoptimized.h"
#ifdef __ARM_NEON
#include "functions_neon64.h"
#endif
#ifdef __SSE3__
#include "functions_see3.h"
#endif
// etc...
/**********************************
** functions_stubs.h *************
*********************************/
template<SIMD_Type type = SIMD_Type::Max>
inline int add(int a, int b) {
constexpr SIMD_Type nextType = static_cast<SIMD_Type>(static_cast<int>(type) - 1);
return add<nextType>(a, b);
}
template<>
inline int add<SIMD_Type::Min>(int a, int b) {
static_assert(SIMD_Type::Min == SIMD_Type::Min, "No implementation found!");
return {};
}
/**********************************
** functions_unoptimized.h *******
*********************************/
template<>
inline int add<SIMD_Type::None>(int a, int b) {
std::cout << "NONE" << std::endl;
return a + b;
}
/**********************************
** functions_neon64.h ************
*********************************/
template<>
inline int add<SIMD_Type::Neon64>(int a, int b) {
std::cout << "NEON!" << std::endl;
return a + b;
}
/**********************************
** functions_neon128.h *******************
*********************************/
template<>
inline int add<SIMD_Type::Neon128>(int a, int b) {
std::cout << "NEON128!" << std::endl;
return a + b;
}
/**********************************
** main.cpp **********************
*********************************/
#include "functions.h"
int main() {
add(1, 2);
}
would output NEON128!
.
Upsides:
Downsides:
__attribute__((always_inline))
/ __forceinline
) which you could add the the function base templates to make sure all recursive calls actually get inlined.This is by far the easiest option - just put each function (or a collection of similar functions) into a single file and do the #ifdef
's there.
That way you have all the functions & their specializations for SIMD in a single file, which should also make editing a lot easier.
e.g.:
/**********************************
** functions.h *******************
*********************************/
#include "functions_add.h"
#include "functions_sub.h"
// etc...
/**********************************
** functions_add.h ***************
*********************************/
#ifdef __SSE3__
// SSE3
int add(int a, int b) {
return a + b;
}
#elifdef __ARM_NEON
// NEON
int add(int a, int b) {
return a + b;
}
#else
// Fallback
int add(int a, int b) {
return a + b;
}
#end
/**********************************
** functions_sub.h ***************
*********************************/
#ifdef __SSE3__
// SSE3
int sub(int a, int b) {
return a - b;
}
#elifdef __ARM_NEON_128
// NEON 128
int sub(int a, int b) {
return a - b;
}
#else
// Fallback
int sub(int a, int b) {
return a - b;
}
#end
Upsides:
Downsides:
#ifdef
's need to be repeated in each headerUpvotes: 5