Reputation: 23052

How can I reduce the compile-time memory footprint of large templates?

Suppose I have a class which contains a large number of other class declarations. Is it possible to spread the cost of these out somehow so that compile-time memory consumption doesn't grow quadratically for nested types? I'm willing to take a hit on compile time if need be, and would gladly divide this out into different translation units if that were an option.

To try and come up with a solution to this I've written the following program that illustrates a simplified version of the kind of code that leads to these blowouts:

// Add T to the list of args of SeqWithArgs N number of times:
template <int N, typename T, typename SeqWithArgs>
struct Append;

template <int N, typename T, template <typename...> class Seq, typename... Args>   
struct Append<N, T, Seq<Args...>>                                                  
{                                                                                  
    using type = typename Append<N-1, T, Seq<Args..., T>>::type;                   
};                                                                                 

template <typename T, template<typename...> class Seq, typename... Args>           
struct Append<0, T, Seq<Args...>>                                                  
{                                                                                  
    using type = Seq<Args...>;                                                     
};                                                                                 

static constexpr const int N = 10;                                              

// Tuple containing N instances of int
using Small = typename Append<N, int, std::tuple<>>::type;

// Tuple containing N instances of Small
using Big = typename Append<N, Small, std::tuple<>>::type;

// Tuple containing N instances of Big
using Huge = typename Append<N, Big, std::tuple<>>::type;

int main()
{                                                                               
    Huge h;                                                                     
    return 0;                                                                   
}

As righly pointed out by Yakk, these Append operations are horribly inefficient. However, in the real version of this code modifying these would require a fundamental restructure of the code.

I varied N from 10 to 70 and got this result compiling my program using GCC 4.8.1. I also ran time -v make to obtain peak resident memory usage. Here are the results using just the default flags:

N^3 nested tuple compile-time memory usage

This result seems excessive to me, not because of the shape (it is expected to be O(N^3) and seems to follow that shape), but because of the magnitude. It almost looks like Small is being expanded for every instantiation of Big, and Big in turn is being expanded for every instantiation of Huge. In less template-heavy code one would normally declare an generic specialisation of some type using the extern keyword and would therefore avoid this "nested expansion", but these are types, not values; does something similar exist for this kind of construct?

What is the cause of this blowout in memory and what can I do to reduce this memory footprint without changing the type of Small, Big and Huge?

Upvotes: 16

Answers (3)

Oguk

Reputation: 1137

EDIT after the below EDIT: The implementation-defined'ness of the whole problem just hit me back. Actually, I only see the improvements mentioned below for clang. I just tried the same thing with g++ 4.8.2, and there the compile times and memory usages are comparable to the improved values with clang (no matter whether I use inheritance or the original type definitions). For example, the N = 70 only needs about 3GB of memory, and not 12GB as in the OP's case. So, for g++, my suggestion is not actually an improvement.

EDIT: From my original answer below, it was clear that the full nested expansion can be prevented by introducing a new class at each level, where the next nesting level is only a member variable. But I just discovered the same thing also works with inheritance. The types Small, Big and Huge are not entirely preserved. You lose type identity, but you preserve functional (run-time) equivalence. So, this is much closer to what the OP wanted than the member trick below. With clang, it reduced compile time for the N=40 case by about a factor of 7. Not sure though whether it changes the scaling. Here is the code:

template<typename T>
struct MyType : Append<N, T, std::tuple<>>::type {
    typedef typename Append<N, T, std::tuple<>>::type Base;
    using Base::Base;
    using Base::operator=;
};

int main()
{

    MyType<MyType<MyType<int>>> huge;

    //You can work with this the same way as with the nested tuples:
    std::get<0>(std::get<0>(std::get<0>(huge)));

    return 0;
}

The basic idea is the same as for the member trick below: By giving the thing at each level a new name that need not/cannot be expanded down to the lowest level (as opposed to simple typedef or using declaration), the "nestedness" is reduced.

Original answer:

So, apparently, the compiler is really determining the inner types over and over again (in contrast to what I initially said in the comments), otherwise the following wouldn't work: If you relax your condition of "not changing the types Small, Big, Huge" to "don't change the logical structure of Small, Big, Huge", you can reduce compile time enormously by having class, where the nested type is a member, instead of just nesting the types itself. I guess that's because, in this case, the compiler doesn't actually have to nest the types. On each level, the tuple member just consists of a number of "Nested<[...]>" types, which the compiler cannot/need not further expand. Of course, this comes at a cost: Special way of initialization, special way of accessing the levels (basically appending ".member" to each call), etc...

#include <tuple>

// Add T to the list of args of SeqWithArgs N number of times:
template <int N, typename T, typename SeqWithArgs>
struct Append;

template <int N, typename T, template <typename...> class Seq, typename... Args>
struct Append<N, T, Seq<Args...>>
{
    using type = typename Append<N-1, T, Seq<Args..., T>>::type;
};

template <typename T, template<typename...> class Seq, typename... Args>
struct Append<0, T, Seq<Args...>>
{
    using type = Seq<Args...>;
};

static constexpr const int N = 40;

template<typename T>
struct Nested {
    typename Append<N, T, std::tuple<>>::type member;
};

int main()
{
    Nested<Nested<Nested<int>>> huge;

    //Access is a little verbose, but could probably
    //be reduced by defining clever template
    //"access classes/functions"

    std::get<0>(std::get<0>(std::get<0>(huge.member).member).member);

    return 0;
}

(Of course, there could also be separate Small, Big, Huge classes instead of the general template Nested, if you want the different levels to have a different structure. It's just for demonstration purposes.)

Upvotes: 1

Yakk - Adam Nevraumont

Reputation: 275385

Append generates Seq<T> Seq<T,T> ... Seq<T,...,T>. Which is less of a problem than the fact it generates Append<n-1, T, Seq<T>>, which is a distinct and unique type for each N and each recursion.

It generates N unique types of total name length O(n^2*|T|), and the output type is of size O(n*|T|).

We then chain this.

Big generates types of total size O(n^2*O(n*|int|)), with end type size O(n^2|int|). Huge types of size O(n^2*O(n^2|int|))=O(n^4|int|).

That is a lot of types generated.

70^4=5000^2=O(25 million) total type length.

We can do better with a less brain dead Append. Do it in three steps.

transcribe takes a t<Ts...> and a template<class...>class Seq and copies the Ts... over.

template<class...>struct t{using type=t;};
template<class src, template<class...>class dest>
struct transcribe;
template<class...Ts, template<class...>class dest>
struct transcribe<t<Ts...>,dest>{
  using type=dest<Ts...>;
};
template<class src, template<class...>class dest>
using transcribe_t=typename transcribe<src, dest>::type;

append takes any number of t<...>s and appends them.

template<class... ts>
struct append;
template<>struct append<>{using type=t<>;};
template<class...Ts>struct append<t<Ts...>>{using type=t<Ts...>;};

template<class... Ts, class... Us, class... Zs>
struct append<t<Ts...>,t<Us...>,Zs....>:
  append<t<Ts...,Us...>,Zs...>
{};
template<class...ts>
using append_t=typename append<ts...>::type;

breaker takes an unsigned value N and breaks it down to powers of two, outputting v<0,1,3,4> (2^0+2^1+2^3+2^4) for 26.

template<unsigned...>struct v{using type=v;};
template<unsigned X, class V=v<>, unsigned B=0, class=void>
struct breaker;
template<unsigned X, unsigned...vs, unsigned B>
struct breaker<X, v<vs...>, B, typename std::enable_if<
  X&(1<<B)
>::type>:breaker<X&~(1<<B), v<vs...,B>, B+1>
{};
template<unsigned X, unsigned...vs, unsigned B>
struct breaker<X, v<vs...>, B, typename std::enable_if<
  !(X&(1<<B))
>::type>:breaker<X&~(1<<B), v<vs...>, B+1>
{};
template<unsigned X, unsigned...vs, unsigned B>
struct breaker<0, v<vs...>, B, void>
{
  using type=v<vs...>;
};
template<unsigned X>
using breaker_t=typename breaker<X>::type;

Build takes an unsigned value N and a T. It Breaks the N. Then we build the power of twos into t<T,T,T,...,T>s. Then we append them.

Then we take the output and transcribe to Seq<...>.

This generates O(N*logN*logN) types. So for large N may be better. Plus most of the types generated are small and simple, which is a plus.

At best this reduces your load by a factor of 10. Worth a try.

Upvotes: 4

Sam Varshavchik

Reputation: 118330

Agreed -- looking at the code, it appears to have N^3 complexity.

I don't think there's a compiler smart enough to figure out that the "base" class of Huge, are going to be same as Small's. The compiler will actually have to work it out, starting from "bottom up", in a manner of saying, figuring out what's in Huge. It will discover, once it's done, that the base classes are going to be the same, but I don't think the smarts are going to be quite there to figure it out, right from the get go. So, it will have to burn up the memory and CPU, to reach that conclusion.

The graph seems to show at least O(N^2), if not O(N^3) complexity. Some of this is, undoubtedly, template-related. Compilers have little leeway when it comes to templates. If you were to graph the time and memory it takes to compile N versus N*2 plain class declarations, I'll bet that the observed complexity won't be 2^3, but linear.

Upvotes: 1

How can I reduce the compile-time memory footprint of large templates?

Answers (3)

Related Questions