bremen_matt
bremen_matt

Reputation: 7367

Eigen: Efficiently storing the output of a matrix evaluation in a raw pointer

I am using some legacy C code that passing around lots of raw pointers. To interface with the code, I have to pass a function of the form:

const int N = ...;

T * func(T * x)  {
    // TODO Put N elements in x
    return x + N;
}

where this function should write the result into x, and then return x.

Internally, in this function, I am using Eigen extensively to perform some calculations. Then I write the result back to the raw pointer using the Map class. A simple example which mimics what I am doing is this:

const int N = 5;
T * func(T * x)  {

    // Do a lot of operations that result in some matrices like
    Eigen::Matrix<T, N, 1 > A = ... 
    Eigen::Matrix<T, N, 1 > B = ... 

    Eigen::Map<Eigen::Matrix<T, N, 1 >> constraint(x);
    constraint = A - B;

    return x + N;
}

Obviously, there is much more complicated stuff going on internally, but that is the gist of it... Do some calculations with Eigen, then use the Map class to write the result back to the raw pointer.

Now the problem is that when I profile this code with Callgrind, and then view the results with KCachegrind, the lines

constraint = A - B;

are almost always the bottleneck. This is sort of understandable, because such lines could/are potentially doing three things:

  1. Constructing the Map object
  2. Performing the calculation
  3. Writing the result to the pointer

So it is understandable that this line might have the longest runtime. But I am a little bit worried that perhaps I am somehow doing an extra copy in that line before the data gets written to the raw pointer.

So is there a better way of writing the result to the raw pointer? Or is that the idiom I should be using?

In the back of my mind, I am wondering if using the placement new syntax would buy me anything here.

Note: This code is mission critical and should run in realtime, so I really need to squeeze every ounce of speed out of it. For instance, getting this call from a runtime of 0.12 seconds to 0.1 seconds would be huge for us. But code legibility is also a huge concern since we are constantly tweaking the model used in the internal calculations.

Upvotes: 1

Views: 224

Answers (1)

ggael
ggael

Reputation: 29225

These two lines of code:

Eigen::Map<Eigen::Matrix<T, N, 1 >> constraint(x);
constraint = A - B;

are essentially compiled by Eigen as:

for(int i=0; i<N; ++i)
  x[i] = A[i] - B[i];

The reality is a bit more complicated because of explicit unrolling, and explicit vectorization (both depends on T), but that's essentially it. So the construction of the Map object is essentially a no-op (it is optimized away by any compiler) and no, there is no extra copy going on here.

Actually, if your profiler is able to tell you that the bottleneck lies on this simple expression, then that very likely means that this piece of code has not been inlined, meaning that you did not enabled compiler optimizations flags (like -O3 with gcc/clang).

Upvotes: 1

Related Questions