Reputation: 107
I want to have a function taking multiple references to containers and returning the element-wise combination of all of them. Since this operation is performed in an extremely hot loop I would like to be able to unroll as many operations as possible statically without writing 5 instances of basically the same function.
The algorithm I am performing basically behaves as
const auto result = s0 + a1 * s1 + a2 * s2 + ...
Where all of the si
are containers containing all the same number of elements. The number of elements to sum over is known at compile-time.
The function I am looking for should behave as: (hypothetically)
inline Container sum(const Container& s0, double a1, const Container& s2, ....){
auto result = Container(s0);
for (int i = 0; i < result.size(); ++i)
result[i] += a1 * s1[i] + a2 * s2[i] + ...;
return result;
}
For performance reasons it is not desirable to write some inner loop with runtime bounds checks. Also when trying to use runtime bounds I encountered the issue of not being easily able to pass a variable number of references to the function, should I just resort to pointers in that case.
All code needs to be valid C++11, I do not have access to a more modern compiler in this project.
Upvotes: 1
Views: 142
Reputation: 217283
I would group double
and container
to simplify code to:
template <typename C, typename ... Cs>
C sum(const C& c0, const Cs&... cs)
{
auto result = c0;
for (int i = 0; i < result.size(); ++i)
#if 0 // C++17
result[i] += (cs[i] + ...);
#else // C++11/C++14
const int dummy[] = {0, (static_cast<void>(result[i] += cs[i]), 0)...};
static_cast<void>(dummy); // avoid warning for unused variable.
#endif
return result;
}
So, for the grouping, something like:
template <typename C>
struct MulContainer
{
auto operator [](int i) const { return d * c[i]; }
double d;
const C& c;
};
So for call, instead of
sum(c0, a1, c1, a2, c2);
you would have:
sum(c0, MulContainer{a1, c1}, MulContainer{a2, c2});
If really needed, with std::index_sequence
you may still have first call syntax.
template <typename C, std::size_t... Is, typename Tuple>
C sum_impl(const C& c0, std::index_sequence<Is...>, const Tuple& t)
{
return sum(c0, MulContainer{std::get<2 * Is>(t), std::get<2 * Is + 1>(t)}...);
}
template <typename C, typename ... Ts>
C final_sum(const C& c0, const Ts&... ts)
{
static_assert(sizeof...(Ts) % 2 == 0);
return sum_impl(c0, std::make_index_sequence<sizeof...(Ts) / 2>{}, std::tie(ts...));
}
std::index_sequence
is C++14 but can be implemented in C++11.
Upvotes: 2