Manvi
Manvi

Reputation: 9

Is there a way to optimize parameter substitution in an equation for faster computation in C++

I've a set of equations which take vector inputs and return solution for different time steps. There are a few parameters in the equations which are received during runtime and stay constant for that cycle. I want to replace such parameter with their values computed during runtime such that further computation uses equations with substituted value of parameter.The original problem has 4000 such parameters but I'm giving a sample equation to illustrate what I'm trying to achieve. A smaller example to illustrate the issue:

 rval[0]  = -p1*x + p2*y*z;
  rval[1]  = -rval[0] - p3*y*y - dy;
  rval[0] -=  dx;
  rval[2]  =  x + y + z - 1;

{p1,p2,p3} are fixed set of parameters whose value is received online and the new set of equations would essentially become

  rval[0]  = (-0.04)*x + (1.0e4)*y*z;
  rval[1]  = -rval[0] - (3.0e7)*y*y - dy;
  rval[0] -=  dx;
  rval[2]  =  x + y + z - 1;

One way I've tried is using symengine symbolic C++ library but it has increased overhead of substitution and computation using symengine equations Code tried for the equations above with symengine library

// Create a map for substitution
SymEngine::map_basic_basic subs_map;
subs_map[x_sym] = real_double(x);
subs_map[y_sym] = real_double(y);
subs_map[z_sym] = real_double(z);
subs_map[dx_sym] = real_double(dx);
subs_map[dy_sym] = real_double(dy);
subs_map[dz_sym] = real_double(dz);

// Evaluate equations
rval[0] = SymEngine::eval_double(*resv1->subs(subs_map));
rval[1] = SymEngine::eval_double(*resv2->subs(subs_map));
rval[2] = SymEngine::eval_double(*resv3->subs(subs_map));

The equations resv1, resv2 and resv3 are same as equations mentioned above with substituted parameters Is there an alternate way to achieve high speed for evaluating equations by substituting the parameters or is there a faster way to do this in symengine as well. Note: Please note that I'm using a much large set of equations than this which runs for several iterations per time step but I'm experimenting with a smaller set of equations to check feasibility of the solution.

Upvotes: 0

Views: 172

Answers (2)

Macroft
Macroft

Reputation: 29

If the parameters are truly computed/collected at runtime, then having them in variables is sufficient. There are ways to hint that a variable doesn't change much, but if you make a point of declaring/calculating those variables outside the loop of the equations, then the compiler will do right thing. (It won't load the variable from memory when it already has it). Often it can even detect that a computation is not dependent on the loop state and will move it out of the loop as part of its own internal optimization. (meaning you might not notice a performance boost when you make this change yourself)

Now if there genuinely are 4000 'p's. You want to organize your loops so that it can process over everything that uses p[1] and p[2]... if you loop over too many 'p's at once, then there will be a lot more memory accesses. Also if you group all the p[1] stuff ahead of the p[2] stuff, then you increase the likelihood that the value can stay in a register. The theme is to not introduce new stuff until you are done with the old stuff if you can avoid it. (In small enough chunks, the optimizer has a chance of doing this for you, and you might not notice a speedup in that chunk)

Pulling repeated computations outside the body of the loop is a code optimization. The rest is optimization for cache misses (which can result in things being tracked in registers.

If they are merely reading in a file with 4000 parameters that change rarely (you can tolerate rebuilding the executable between changes), Then you could consider putting them in a header file as a const static double p[].

I assume you are writing loops and this might not gain you anything.

But it will do the substitution at compile time and optimize as constants when it can. If you just have 4000 hardcoded indexes/names in your equations, then this will indeed get you some pretty good optimization. This smells funny though and there might be a better design to your software (out of scope of the question).

- Out of scope of 'parameter substitution optimization': If you are solving many repeated things similar to your example, looking at Cuda/GPU to do it massively in parallel might be attractive.

Upvotes: 1

Yakk - Adam Nevraumont
Yakk - Adam Nevraumont

Reputation: 275210

Using double p2 = 1.0e4 or the constant 1.0e4 really doesn't matter much to a floating point mathematics unit. If you are using -ffast-math it can make some difference, as it lets the compiler reorder floating point expressions to not exactly match the IEEE required results, but instead generate equations that are algebraically equivalent.

See What does gcc's ffast-math actually do? for details on what -ffast-math does and alternatives.

Even with ffast-math, true constants (ie, recompile the program with the constants hard coded) might cause a significant rewrite in some cases (ie, it solves that your constants means some value is multiplied by 0, so it skips calculating that value), but short of that you aren't going to get much improvement. Humans might get faster if asked to multiply by 3 times ten to the 7 a few million times in a row, but a FPU won't. The millionths multiplication will be just as fast.

You can arrange your math to make FPUs faster, but that mainly involves making your math follow SIMD patterns, doing 4 or 8 or more parallel floating point operations per cycle; or in more extreme cases, arranging for your code to be run on a GPU (which can sort of work like a super-wide SIMD FPU, doing 100s of identical operations in parallel very cheaply).

None of these look like things that would magically happen after you "fixed some parameters". They would require work on your part to get working.

Upvotes: 1

Related Questions