RAC
RAC

Reputation: 5239

Template-ing a 'for' loop in C++?

I have a C++ snippet below with a run-time for loop,

for(int i = 0; i < I; i++)
  for (int j = 0; j < J; j++)
    A( row(i,j), column(i,j) ) = f(i,j);

The snippet is called repeatedly. The loop bounds 'I' and 'J' are known at compile time (I/J are the order of 2 to 10). I would like to unroll the loops somehow using templates. The main bottleneck is the row() and column() and f() functions. I would like to replace them with equivalent metaprograms that are evaluated at compile-time, using row<i,j>::enum tricks.

What I'd really love is something that eventually resolves the loop into a sequence of statements like:

A(12,37) = 0.5;
A(15,23) = 0.25;
A(14,45) = 0.25;

But I'd like to do so without wrecking the for-for structure too much. Something in the spirit of:

TEMPLATE_FOR<i,0,I>
  TEMPLATE_FOR<j,0,J>
     A( row<i,j>::value, column<i,j>::value ) = f<i,j>::value

Can boost::lambda (or something else) help me create this?

Upvotes: 26

Views: 17984

Answers (9)

Nick Dandoulakis
Nick Dandoulakis

Reputation: 43110

Check out Template Metaprograms and the bubble sort implementations.

Upvotes: 4

CiscoIPPhone
CiscoIPPhone

Reputation: 9477

You could use Boost MPL.

An example of loop unrolling is on this mpl::for_each page.

for_each< range_c<int,0,10> >( value_printer() );

It doesn't seem that it's all evaluated at compile time, but it may be a good starting point.

Upvotes: 6

James Hopkin
James Hopkin

Reputation: 13973

This is the way to do it directly:

template <int i, int j>
struct inner
{
  static void value()
  {
    A(row<i,j>::value, column<i,j>::value) = f<i,j>::value;
    inner<i, j+1>::value();
  }
};

template <int i> struct inner<i, J> { static void value() {} };

template <int i>
struct outer
{
  static void value()
  {
    inner<i, 0>::value();
    outer<i+1>::value();
  }
};

template <> struct outer<I> { static void value() {} };

void test()
{
  outer<0>::value();
}

You can pass A through as a parameter to each of the values if necessary.

Here's a way with variadic templates that doesn't require hard coded I and J:

#include <utility>

template <int j, class Columns>
struct Inner;

template <class Columns, class Rows>
struct Outer;

template <int j, int... i>
struct Inner<j, std::index_sequence<i...>>
{
  static void value() { (A(column<i, j>::value, row<i, j>::value), ...); }
};


template <int... j, class Columns>
struct Outer<std::index_sequence<j...>, Columns>
{
  static void value() { (Inner<j, Columns>::value(), ...); }
};

template <int I, int J>
void expand()
{
  Outer<std::make_index_sequence<I>, std::make_index_sequence<J>>::value();
}

void test()
{
  expand<3, 5>();
}

(snippet with generated assembly: https://godbolt.org/g/DlgmEl)

Upvotes: 8

Ferruccio
Ferruccio

Reputation: 100658

I've never tried to do this so take this idea with a grain of salt...

It seems like you could use Boost.Preprocessor to do the loop unrolling (specifically the BOOST_PP_FOR and BOOST_PP_FOR_r macros) and then use templates to generate the actual constant expression.

Upvotes: 1

Greg Rogers
Greg Rogers

Reputation: 36439

If you are willing to modify the syntax a bit you can do something like this:

template <int i, int ubound>
struct OuterFor {
    void operator()() {
        InnerFor<i, 0, J>()();
        OuterFor<i + 1, ubound>()();
    }
};

template <int ubound>
struct OuterFor <ubound, ubound> {
    void operator()() {
    }
};

In InnerFor, i is the outer loops counter (compile time constant), j is the inner loops counter (initially 0 - also compile time constant), so you can evaluate row as a compile time template.

Its a bit more complicated, but as you say, row(), col(), and f() are your complicated parts anyways. At least try it and see if the performance is worth it. It may be worth it to investigate other options to simplify your row(), etc functions.

Upvotes: 1

James Hopkin
James Hopkin

Reputation: 13973

f would need to return a double - that can't be done at compile time.

Upvotes: 1

Ben
Ben

Reputation: 7608

I would say it is a false good-idea.

In C++ this : row<i,j>::value
means you will have as many differents row<>() functions than you have i * j. You don't want this because it will increase the size of the code and do a lot of instruction cache misses.

I observed this when I was doing template functions to avoid a single boolean check.

If is a short function just inline it.

Upvotes: 4

Stack Overflow is garbage
Stack Overflow is garbage

Reputation: 247959

You could use Boost.Mpl to implement the whole thing at compile-time, but I'm not sure it'll be faster. (Mpl essentially re-implements all the STL algorithms as compile-time metaprogramming templates)

The problem with that approach is that you end up unrolling and inlining a lot of code, which may thrash the instruction cache and eat up memory bandwidth that could have been saved. That may produce huge, bloated and slow code.

I would probably probably rather trust the compiler to inline the functions that make sense. As long as the row and column function definitions are visible from the loop, the compiler can trivially inline the calls and unroll as many iterations as it deems beneficial.

Upvotes: 3

T.E.D.
T.E.D.

Reputation: 44804

A good compiler should do unrolling for you. For instance, in gcc compiling with the -O2 option turns on loop unrolling.

If you try to do it yourself manually, unless you measure things carefully and really know what you are doing, you are liable to end up with slower code. For example, in your case with manual unrolling you are liable to prevent the compiler from being able to do a loop interchange or stripmine optimization (look for --floop-interchange and -floop-strip-mine in the gcc docs)

Upvotes: 13

Related Questions