Reputation: 614
I am wondering how to write code that at times makes use of OpenMP parallelization built into the Eigen library while at other times uses Parallelization that I specify. Hopefully, the below code snippet should provide background into my problem. I am asking this question at the design stage of my library (sorry I don't have a working / broken code example).
#ifdef _OPENMP
#include <omp.h>
#endif
#include <RcppEigen.h>
void fxn(..., int ncores=-1){
if (ncores > 0) omp_set_num_threads(ncores);
/*
* Code with matrix products
* where I would like to use Eigen's
* OpenMP parallelization
*/
#pragma omp parallel for
for (int i=0; i < iter; i++){
/*
* Code I would like to parallelize "myself"
* even though it involves matrix products
*/
}
}
What is best practice for controlling the balance between Eigen's own parallelization with OpenMP and my own.
I wrote a simple example and tested ggael's suggestion. In short, I am skeptical that it solves the problem I was posing (or I am doing something else wrong - apologies if its the latter). Notice that with explicit parallelization of the for loop there is no change in run-time (not even a slow
#ifdef _OPENMP
#include <omp.h>
#endif
#include <RcppEigen.h>
using namespace Rcpp;
// [[Rcpp::plugins(openmp)]]
// [[Rcpp::export]]
Eigen::MatrixXd testing(Eigen::MatrixXd A, Eigen::MatrixXd B, int n_threads=1){
Eigen::setNbThreads(n_threads);
Eigen::MatrixXd C = A*B;
Eigen::setNbThreads(1);
for (int i=0; i < A.cols(); i++){
A.col(i).array() = A.col(i).array()*B.col(i).array();
}
return A;
}
// [[Rcpp::export]]
Eigen::MatrixXd testing_omp(Eigen::MatrixXd A, Eigen::MatrixXd B, int n_threads=1){
Eigen::setNbThreads(n_threads);
Eigen::MatrixXd C = A*B;
Eigen::setNbThreads(1);
#pragma omp parallel for num_threads(n_threads)
for (int i=0; i < A.cols(); i++){
A.col(i).array() = A.col(i).array()*B.col(i).array();
}
return A;
}
/*** R
A <- matrix(rnorm(1000*1000), 1000, 1000)
B <- matrix(rnorm(1000*1000), 1000, 1000)
microbenchmark::microbenchmark(testing(A,B, n_threads=1),
testing_omp(A,B, n_threads=1),
testing(A,B, n_threads=8),
testing_omp(A,B, n_threads=8),
times=10)
*/
Unit: milliseconds
expr min lq mean median uq max neval cld
testing(A, B, n_threads = 1) 169.74272 183.94500 212.83868 218.15756 236.97049 264.52183 10 b
testing_omp(A, B, n_threads = 1) 166.53132 178.48162 210.54195 227.65258 234.16727 238.03961 10 b
testing(A, B, n_threads = 8) 56.03258 61.16001 65.15763 62.67563 67.37089 83.43565 10 a
testing_omp(A, B, n_threads = 8) 54.18672 57.78558 73.70466 65.36586 67.24229 167.90310 10 a
Upvotes: 3
Views: 516
Reputation: 29205
The easiest is probably to disable/enable Eigen's multi-threading at runtime:
Eigen::setNbThreads(1); // single thread mode
#pragma omp parallel for
for (int i=0; i < iter; i++){
// Code I would like to parallelize "myself"
// even though it involves matrix products
}
Eigen::setNbThreads(0); // restore default
Upvotes: 4