Daniel Marchand
Daniel Marchand

Reputation: 644

Basic ways to speed up a simple Eigen program

I'm looking for the fastest way to do simple operations using Eigen. There are so many datastructures available, its hard to tell which is the fastest.

I've tried to predefine my data structures, but even then my code is being outperformed by similar Fortran code. I've guessed Eigen::Vector3d is the fastest for my needs, (since its predefined), but I could easily be wrong. Using -O3 optimization during compile time gave me a big boost, but I'm still running 4x slower than a Fortran implementation of the same code.

I make use of an 'Atom' structure, which is then stored in an 'atoms' vector defined by the following:

struct Atom {
    std::string element;
    //double x, y, z;
    Eigen::Vector3d coordinate;
};
std::vector<Atom> atoms;

The slowest part of my code is the following:

distance = atoms[i].coordinate - atoms[j].coordinate;
distance_norm = distance.norm();

Is there a faster data structure I could use? Or is there a faster way to perform these basic operations?

Upvotes: 5

Views: 6573

Answers (3)

Avi Ginsburg
Avi Ginsburg

Reputation: 10596

As you pointed out in your comment, adding the -fno-math-errno compiler flag gives you a huge increase in speed. As to why that happens, your code snipped shows that you're doing a sqrt via distance_norm = distance.norm();.

This makes the compiler not set ERRNO after each sqrt (that's a saved write to a thread local variable), which is faster and enables vectorization of any loop that is doing this repeatedly.The only disadvantage to this is that the IEEE adherence is lost. See gcc man.

Another thing you might want to try is adding -march=native and adding -mfma if -march=native doesn't turn it on for you (I seem to remember that in some cases it wasn't turned on by native and had to be turned on by hand - check here for details). And as always with Eigen, you can disable bounds checking with -DNDEBUG.

SoA instead of AoS!!! If performance is actually a real problem, consider using a single 4xN matrix to store the positions (and have Atom keep the column index instead of the Eigen::Vector3d). It shouldn't matter too much in the small code snippet you showed, but depending on the rest of your code, may give you another huge increase in performance.

Upvotes: 6

Dan
Dan

Reputation: 125

Either try another compiler like Intel C++ compiler (free for academic and non-profit usage) or use other libraries like Intel MKL (far faster that your own code) or even other BLAS/LAPACK implementations for dense matrices or PARDISO or SuperLU (not sure if still exists) for sparse matrices.

Upvotes: -1

keith
keith

Reputation: 5332

Given you are ~4x off, it might be worth checking that you have enabled vectorization such as AVX or AVX2 at compile time. There are of course also SSE2 (~2x) and AVX512 (~8x) when dealing with doubles.

Upvotes: 0

Related Questions