Reputation: 715
I want to use Eigen matrices in combination with OpenMP reduction.
In the following is a small example of how I do it (and it works). The object myclass
has three attributes (an Eigen matrix, two integers corresponding to its dimension) and a member function do_something
that uses an omp reduction on a sum which I define because Eigen matrices are not standard types.
#include "Eigen/Core"
class myclass {
public:
Eigen::MatrixXd m_mat;
int m_n; // number of rows in m_mat
int m_p; // number of cols in m_mat
myclass(int n, int p); // constructor
void do_something(); // omp reduction on `m_mat`
}
myclass::myclass(int n, int p) {
m_n = n;
m_p = p;
m_mat = Eigen::MatrixXd::Zero(m_n,m_p); // init m_mat with null values
}
#pragma omp declare reduction (+: Eigen::MatrixXd: omp_out=omp_out+omp_in)\
initializer(omp_priv=MatrixXd::Zero(omp_orig.rows(), omp_orig.cols()))
void myclass::do_something() {
Eigen::MatrixXd tmp = Eigen::MatrixXd::Zero(m_n, m_p); // temporary matrix
#pragma omp parallel for reduction(+:tmp)
for(int i=0; i<m_n;i++) {
for(int l=0; l<m_n; l++) {
for(int j=0; j<m_p; j++) {
tmp(l,j) += 10;
}
}
}
m_mat = tmp;
}
Problem: OpenMP does not allow (or at least not all implementations) to use reduction on class members but only on variables. Thus, I do the reduction on a temporary matrix and I have this copy at the end m_mat = tmp
which I would like to avoid (because m_mat
can be a big matrix and I use this reduction a lot in my code).
Wrong fix: I tried to use Eigen Map so that tmp
corresponds to data stored in m_mat
. Thus, I replaced the omp reduction declaration and the do_something
member function definition in the previous code with:
#pragma omp declare reduction (+: Eigen::Map<Eigen::MatrixXd>: omp_out=omp_out+omp_in)\
initializer(omp_priv=MatrixXd::Zero(omp_orig.rows(), omp_orig.cols()))
void myclass::do_something() {
Eigen::Map<Eigen::MatrixXd> tmp = Eigen::Map<Eigen::MatrixXd>(m_mat.data(), m_n, m_p);
#pragma omp parallel for reduction(+:tmp)
for(int i=0; i<m_n;i++) {
for(int l=0; l<m_n; l++) {
for(int j=0; j<m_p; j++) {
tmp(l,j) += 10;
}
}
}
}
However, it does not work anymore and I get the following error at compilation:
error: conversion from ‘const ConstantReturnType {aka const Eigen::CwiseNullaryOp, Eigen::Matrix >}’ to non-scalar type ‘Eigen::Map, 0, Eigen::Stride<0, 0> >’ requested initializer(omp_priv=Eigen::MatrixXd::Zero(omp_orig.rows(), omp_orig.cols()))
I get that the implicit conversion from Eigen::MatrixXd
to Eigen::Map<Eigen::MatrixXd>
does not work in the omp reduction but I don't know how to make it work.
Thanks in advance
Edit 1: I forgot to mention that I use gcc v5.4 on a Ubuntu machine (tried both 16.04 and 18.04)
Edit 2: I modified my example as there was no reduction in the first one. This example is not exactly what I do in my code, it is just a minimum "dumb" example.
Upvotes: 2
Views: 852
Reputation: 938
As @ggael mentioned in their answer, Eigen::Map
can't be used for this because it needs to map to existing storage. If you did make it work, all threads would use the same underlying memory which would create a race condition.
The likeliest solution to avoiding the temporary you create in the initial thread is to bind the member variable to a reference, which should always be valid for use in a reduction. That would look something like this:
void myclass::do_something() {
Eigen::MatrixXd &loc_ref = m_mat; // local binding
#pragma omp parallel for reduction(+:loc_ref)
for(int i=0; i<m_n;i++) {
for(int l=0; l<m_n; l++) {
for(int j=0; j<m_p; j++) {
loc_ref(l,j) += 10;
}
}
}
// m_mat = tmp; no longer necessary, reducing into the original
}
That said, note that this still creates a local copy of the zero matrix in each and every thread, much like @ggael showed in the example. Using reduction in this way will be quite expensive. If the actual code is doing something like the code snippet, where values are added based on nested loops like this the reduction could be avoided by dividing the work such that either:
Solution 1 example:
void myclass::do_something() {
// loop transposed so threads split across l
#pragma omp parallel for
for(int l=0; l<m_n; l++) {
for(int i=0; i<m_n;i++) {
for(int j=0; j<m_p; j++) {
loc_ref(l,j) += 10;
}
}
}
}
Solution 2 example:
void myclass::do_something() {
#pragma omp parallel for
for(int i=0; i<m_n;i++) {
for(int l=0; l<m_n; l++) {
for(int j=0; j<m_p; j++) {
auto &target = m_mat(l,j);
// use the ref to get a variable binding through the operator()
#pragma omp atomic
target += 10;
}
}
}
}
Upvotes: 4
Reputation: 29245
The problem is that Eigen::Map
can only be created over an existing memory buffer. In your example, the underlying OpenMP implementation will try to do something like that:
Eigen::Map<MatrixXd> tmp_0 = MatrixXd::Zero(r,c);
Eigen::Map<MatrixXd> tmp_1 = MatrixXd::Zero(r,c);
...
/* parallel code, thread #i accumulate in tmp_i */
...
tmp = tmp_0 + tmp_1 + ...;
and stuff like Map<MatrixXd> tmp_0 = MatrixXd::Zero(r,c)
is of course not possible. omp_priv
has to be a MatrixXd
. I don't know if it's possible to customize the type of the private temporaries created by OpenMP. If not you can do the job by hand by creating a std::vector<MatrixXd> tmps[omp_num_threads()];
and doing the final reduction yourself, or better: don't bother about making a single additional copy, it will be largely negligible compared to all other work and copies done by OpenMP itself.
Upvotes: 2