OpenMP and (Rcpp)Eigen

Question

I am wondering how to write code that at times makes use of OpenMP parallelization built into the Eigen library while at other times uses Parallelization that I specify. Hopefully, the below code snippet should provide background into my problem. I am asking this question at the design stage of my library (sorry I don't have a working / broken code example).

#ifdef _OPENMP
  #include 
#endif

#include 

void fxn(..., int ncores=-1){
  if (ncores > 0) omp_set_num_threads(ncores);
  /*
  * Code with matrix products 
  * where I would like to use Eigen's 
  * OpenMP parallelization
  */ 

  #pragma omp parallel for
  for (int i=0; i < iter; i++){
  /* 
  * Code I would like to parallelize "myself"
  * even though it involves matrix products
  */
  }
}

What is best practice for controlling the balance between Eigen's own parallelization with OpenMP and my own.

UPDATE:

I wrote a simple example and tested ggael's suggestion. In short, I am skeptical that it solves the problem I was posing (or I am doing something else wrong - apologies if its the latter). Notice that with explicit parallelization of the for loop there is no change in run-time (not even a slow

#ifdef _OPENMP
  #include 
#endif 
#include 

using namespace Rcpp;
// [[Rcpp::plugins(openmp)]]

// [[Rcpp::export]]
Eigen::MatrixXd testing(Eigen::MatrixXd A, Eigen::MatrixXd B, int n_threads=1){
  Eigen::setNbThreads(n_threads);
  Eigen::MatrixXd C = A*B;
  Eigen::setNbThreads(1);
  for (int i=0; i < A.cols(); i++){
    A.col(i).array() = A.col(i).array()*B.col(i).array(); 
  }
  return A;
}

// [[Rcpp::export]]
Eigen::MatrixXd testing_omp(Eigen::MatrixXd A, Eigen::MatrixXd B, int n_threads=1){
  Eigen::setNbThreads(n_threads);
  Eigen::MatrixXd C = A*B;
  Eigen::setNbThreads(1);
  #pragma omp parallel for num_threads(n_threads)
  for (int i=0; i < A.cols(); i++){
    A.col(i).array() = A.col(i).array()*B.col(i).array(); 
  }
  return A;
}


/*** R
A <- matrix(rnorm(1000*1000), 1000, 1000)
B <- matrix(rnorm(1000*1000), 1000, 1000)
microbenchmark::microbenchmark(testing(A,B, n_threads=1),
                               testing_omp(A,B, n_threads=1),
                               testing(A,B, n_threads=8), 
                               testing_omp(A,B, n_threads=8), 
                               times=10)
*/

Unit: milliseconds
                             expr       min        lq      mean    median        uq       max neval cld
     testing(A, B, n_threads = 1) 169.74272 183.94500 212.83868 218.15756 236.97049 264.52183    10   b
 testing_omp(A, B, n_threads = 1) 166.53132 178.48162 210.54195 227.65258 234.16727 238.03961    10   b
     testing(A, B, n_threads = 8)  56.03258  61.16001  65.15763  62.67563  67.37089  83.43565    10  a 
 testing_omp(A, B, n_threads = 8)  54.18672  57.78558  73.70466  65.36586  67.24229 167.90310    10  a

OpenMP and (Rcpp)Eigen

UPDATE:

Answers (1)

Related Questions