lazyguy
lazyguy

Reputation: 107

Variadic template for taking multiple containers references

I want to have a function taking multiple references to containers and returning the element-wise combination of all of them. Since this operation is performed in an extremely hot loop I would like to be able to unroll as many operations as possible statically without writing 5 instances of basically the same function.

The algorithm I am performing basically behaves as

const auto result = s0 + a1 * s1 + a2 * s2 + ...

Where all of the si are containers containing all the same number of elements. The number of elements to sum over is known at compile-time.

The function I am looking for should behave as: (hypothetically)

inline Container sum(const Container& s0, double a1, const Container& s2, ....){
    auto result = Container(s0);
    for (int i = 0; i < result.size(); ++i)
        result[i] += a1 * s1[i] + a2 * s2[i] + ...;
    return result;
}

For performance reasons it is not desirable to write some inner loop with runtime bounds checks. Also when trying to use runtime bounds I encountered the issue of not being easily able to pass a variable number of references to the function, should I just resort to pointers in that case.

All code needs to be valid C++11, I do not have access to a more modern compiler in this project.

Upvotes: 1

Views: 142

Answers (1)

Jarod42
Jarod42

Reputation: 217283

I would group double and container to simplify code to:

template <typename C, typename ... Cs>
C sum(const C& c0, const Cs&... cs)
{
    auto result = c0;
    for (int i = 0; i < result.size(); ++i)
#if 0 // C++17
        result[i] += (cs[i] + ...); 
#else // C++11/C++14
        const int dummy[] = {0, (static_cast<void>(result[i] += cs[i]), 0)...};
        static_cast<void>(dummy); // avoid warning for unused variable.
#endif
    return result;
}

So, for the grouping, something like:

template <typename C>
struct MulContainer
{
    auto operator [](int i) const { return d * c[i]; }

    double d;
    const C& c;
};

So for call, instead of

sum(c0, a1, c1, a2, c2);

you would have:

sum(c0, MulContainer{a1, c1}, MulContainer{a2, c2});

If really needed, with std::index_sequence you may still have first call syntax.

template <typename C, std::size_t... Is, typename Tuple>
C sum_impl(const C& c0, std::index_sequence<Is...>, const Tuple& t)
{
    return sum(c0, MulContainer{std::get<2 * Is>(t), std::get<2 * Is + 1>(t)}...);
}

template <typename C, typename ... Ts>
C final_sum(const C& c0, const Ts&... ts)
{
    static_assert(sizeof...(Ts) % 2 == 0);
    return sum_impl(c0, std::make_index_sequence<sizeof...(Ts) / 2>{}, std::tie(ts...));
}

std::index_sequence is C++14 but can be implemented in C++11.

Upvotes: 2

Related Questions