DaveTweed
DaveTweed

Reputation: 113

Are efficient "repeatedly used intermediates" possible in C++ expression template programming?

Here's one thing I haven't seen explicitly addressed in C++ expression template programming in order to avoid building unnecessary temporaries (through creating trees of "inlinable templated objects" that only get collapsed at the assignment operator). Suppose for the illustration we're modeling 1-D sequences of values, with elementwise application of arithmetic operators like +, *, etc. Call the basic class for fully-created sequences Seq (which holds a fixed-length list of doubles for the sake of concreteness) and consider the following illustrative pseudo-C++-code.

void f(Seq &a,Seq &b,Seq &c,Seq &d,Seq &e){
    AType t=(a+2*b)/(a+b+c); // question is about what AType can be
    Seq f=d*t;
    Seq g=e*e*t;
    //do something with f and g
}

where there are expression templated overloads for +, etc, elsewhere. For the line defining t:

My understanding is that there's no clever code rewriting and/or usage of templates that allow each chunk of t to be calculated only once and for t to be calculated chunkwise rather than all at once?

(I can vaguely imagine AType could be some kind of object that contains both an expression template type and a cached value that gets written after it's evaluated the first time, but that doesn't seem to help with the need to synchronise the two implicit loops in the assignments to f and g.)

In googling, I have come across one Masters thesis on another subject that mentions in passing that manual "common subexpression elimination" should be avoided with expression templates, but I'd like to find a more authoritative "it's not possible" or a "here's how to do it".

The closest stackoverflow question is Intermediate results using expression templates which seems to be about the type-naming issue rather than the efficiency issue in creating a full intermediate.

Upvotes: 5

Views: 192

Answers (1)

Mooing Duck
Mooing Duck

Reputation: 66981

Since you obviously don't want to do the entire calculation twice, you have to cache it somehow. The easiest way to cache it seems to be for AType to be a Seq. You say This has the downside of a full intermediate variable, but that's exactly what you want in this case. That full intermediate is your cache, and cannot be trivially avoided.

If you profile the code and this is a chokepoint, then the only faster way I can think of is to write a special function to calculate f and g in parallell, but that'd be super-confusing, and very much not recommended.

void g(Seq &d, Seq &e, Expr &t, Seq &f, Seq &g) 
{
    for(int i=0; i<d.size(); ++i) {
        auto ti = t[i];
        f[i] = d[i]*ti;
        g[i] = e[i]*e[i]*ti;
    }
}
void f(Seq &a,Seq &b,Seq &c,Seq &d,Seq &e) 
{
    Expr t = (a+2*b)/(a+b+c);
    Seq f, g;
    g(d, e, t, f, g);
    //do something with f and g
}

Upvotes: 2

Related Questions