nosbor
nosbor

Reputation: 3009

Optimize C++ template executions

I am working on project where the performance is critical. The application is processing a huge amount of data. Code is written in C++ and I need to do some changes.

There is given following code (It is NOT my code and I simplified it to minimum):

void process<int PARAM1, int PARAM2>() {
    // processing the data
}

void processTheData (int param1, int param2) { // wrapper

    if (param1 == 1 && param2 == 1) { // Ugly looking block of if's
        process<1, 1>();
    else if(param1 == 1 && param2 == 2) {
        process<1, 2>();
    else if(param1 == 1 && param2 == 3) {
        process<1, 3>();
    else if(param1 == 1 && param2 == 4) {
        process<1, 4>();
    else if(param1 == 2 && param2 == 1) {
        process<2, 1>();
    else if(param1 == 2 && param2 == 2) {
        process<2, 2>();
    else if(param1 == 2 && param2 == 3) {
        process<2, 3>();
    else if(param1 == 2 && param2 == 4) {
        process<2, 4>();
    }   // and so on....

}

And the main function:

int main(int argc, char *argv[]) {

    factor1 = atoi(argv[1]);
    factor2 = atoi(argv[2]);

    // choose some optimal param1 and param2
    param1 = choseTheOptimal(factor1, factor2);
    param2 = choseTheOptimal(factor1, factor2);

    processTheData(param1, param2); //start processing

    return 0;
}

Hopefully the code looks clear.

The functions:

There is a limited number of values that the params (param1 and param2) takes (Let's say about 10 x 10).

The values of param1 and param2 are NOT known before execution.

If I simply rewrite the process function so it uses the function parameters instead of template constants (means process(int PARAM1, int PARAM2)) then the processing is about 10 times slower.

Because of the above the PARAM1 and PARAM2 must be the constant of process function.

Is there any smart way to get rid of this ugly block of if's located in processTheData function?

Upvotes: 3

Views: 124

Answers (3)

nwp
nwp

Reputation: 10011

This is what I came up with. It uses less fancy features (only enable_if, no variadic templates or function pointers) but it is also less generic. Pasting the code into godbolt indicates that compilers are able to optimize this completely away for the example code which may have a performance advantage in the real code.

#include <type_traits>

template <int param1, int param2>
void process() {
    static_assert(sizeof(int[param1 + 1]) + sizeof(int[param2 + 1]) > 0);
}

template <int limit2, int param1, int param2>
std::enable_if_t<(param2 > limit2)> pick_param2(int) {
    static_assert("Invalid value for parameter 2");
}

template <int limit2, int param1, int param2>
std::enable_if_t<param2 <= limit2> pick_param2(int p) {
    if (p > 0) {
        pick_param2<limit2, param1, param2 + 1>(p - 1);
    } else {
        process<param1, param2>();
    }
}

template <int limit1, int limit2, int param>
std::enable_if_t<(param > limit1)> pick_param1(int, int) {
    static_assert("Invalid value for parameter 1");
}

template <int limit1, int limit2, int param>
std::enable_if_t<param <= limit1> pick_param1(int p1, int p2) {
    if (p1 > 0) {
        pick_param1<limit1, limit2, param + 1>(p1 - 1, p2);
    } else {
        pick_param2<limit2, param, 0>(p2);
    }
}

template <int limit_param1, int limit_param2>
void pick_params(int param1, int param2) {
    pick_param1<limit_param1, limit_param2, 0>(param1, param2);
}

int main() {
    int p1 = 3;
    int p2 = 5;
    pick_params<10, 10>(p1, p2);
}

I'd be interested in profiling results.

Upvotes: 0

Yakk - Adam Nevraumont
Yakk - Adam Nevraumont

Reputation: 275956

This is what I call the matic switch. It takes a runtime value (within a specified range), and turns it into a compile time value.

namespace details 
{
  template<std::size_t I>
  using index_t = std::integral_constant<std::size_t, I>;

  template<class F>
  using f_result = std::result_of_t< F&&(index_t<0>) >;
  template<class F>
  using f_ptr = f_result<F>(*)(F&& f);
  template<class F, std::size_t I>
  f_ptr<F> get_ptr() {
    return [](F&& f)->f_result<F> {
      return std::forward<F>(f)(index_t<I>{});
    };
  }
  template<class F, std::size_t...Is>
  auto dispatch( F&& f, std::size_t X, std::index_sequence<Is...> ) {
    static const f_ptr<F> table[]={
      get_ptr<F, Is>()...
    };
    return table[X](std::forward<F>(f));
  }
}
template<std::size_t max, class F>
details::f_result<F>
dispatch( F&& f, std::size_t I ) {
  return details::dispatch( std::forward<F>(f), I, std::make_index_sequence<max>{} );
}

what this does is build a jump table to convert runtime data to a compile time constant. I use a lambda, because it makes it nice and generic, and pass it an integral constant. An integral constant is a runtime stateless object whose type carries the constant with it.

An example use:

template<std::size_t a, std::size_t b>
void process() {
    static_assert( sizeof(int[a+1]) + sizeof(int[b+1]) >= 0 );
}

constexpr int max_factor_1 = 10;
constexpr int max_factor_2 = 10;

int main() {
    int factor1 = 1;
    int factor2 = 5;

    dispatch<max_factor_1>(
      [factor2](auto factor1) {
        dispatch<max_factor_2>(
          [factor1](auto factor2) {
            process< decltype(factor1)::value, decltype(factor2)::value >();
          },
          factor2
        );
      },
      factor1
    );
}

where max_factor_1 and max_factor_2 are constexpr values or expressions.

This uses C++14 for auto lambdas and constexpr implicit cast from integral constants.

Live example.

Upvotes: 4

Richard Hodges
Richard Hodges

Reputation: 69922

Like this.

#include <array>
#include <utility>

template<int PARAM1, int PARAM2>
void process() {
    // processing the data
}

// make a jump table to call process<X, Y> where X is known and Y varies    
template<std::size_t P1, std::size_t...P2s>
constexpr auto make_table_over_p2(std::index_sequence<P2s...>)
{
    return std::array<void (*)(), sizeof...(P2s)>
    {
        &process<int(P1), int(P2s)>...
    };
}

// make a table of jump tables to call process<X, Y> where X and Y both vary    
template<std::size_t...P1s, std::size_t...P2s>
constexpr auto make_table_over_p1_p2(std::index_sequence<P1s...>, std::index_sequence<P2s...> p2s)
{
    using element_type = decltype(make_table_over_p2<0>(p2s));
    return std::array<element_type, sizeof...(P1s)>
    {
        make_table_over_p2<P1s>(p2s)...
    };
}


void processTheData (int param1, int param2) { // wrapper

    // make a 10x10 jump table
    static const auto table = make_table_over_p1_p2(
        std::make_index_sequence<10>(), 
        std::make_index_sequence<10>()
    ) ;

    // todo - put some limit checks here

    // dispatch
    table[param1][param2]();
}

Upvotes: 8

Related Questions