Reputation: 15
According to Eigen documentation, as long as the proper compile flag is set and the OMP_NUM_THREADS=x is defined, all sparse matrix/dense vector multiplications will run in parallel, no matter where the multiplication takes place. After doing those, however, I observed that only 1 core all the time was used by inspecting the htop.
I'm concerning line 58 and line 98 in the following source code, where sm/dv multiplications take place. On thing to note is that following code is a part of the unsupported iterative solver module of Eigen, but I don't think this fact gives rise to the failure of parallelization.
https://eigen.tuxfamily.org/dox/unsupported/MINRES_8h_source.html
The platform is Xeon Gold 6126, and the compile flags I used are
CC=g++
FLAGS=-std=c++11 -m64 -O3 -fopenmp -march=skylake-avx512
I submit the job by the following script
#!/bin/bash
#something
#SBATCH -n 8
#something
OMP_NUM_THREADS=8 ./my_executable
which I assume has properly set up the openmp.
I roughly recall that some one mentioned that in order to take advantage of multiple cores, the sparse matrix has to be filled fully, instead of just the upper/lower triangle. I indeed only filled the upper triangle only, and not sure if this is the cause.
Any suggestion what I missed? Thanks in advance.
Upvotes: 0
Views: 936
Reputation: 23788
This is not correct:
as long as the proper compile flag is set and the OMP_NUM_THREADS=x is defined, all sparse matrix/dense vector multiplications will run in parallel
As described in the documentation, thread prallelization with OpenMP is available for row-major-sparse * dense vector/matrix products
The default storage order of a SparseMatrix in Eigen is column major, for which the parallelization does not apply. For parallel MVPs with OpenMP, a double precision sparse matrix should be defined like this:
Eigen::SparseMatrix<double, Eigen::RowMajor>
BTW, it is not necessary to specify OMP_NUM_THREADS. This value is set by default to the maximum available threads.
Upvotes: 1