Reputation: 1
I am fitting a statistical model whose likelihood and gradient evaluation depend on many complicated matrix operations, for which the Eigen
library is very well-suited. I am working within Rcpp
and having great success with RcppEigen
. My likelihood/gradient contain a loop over the data size that is, in theory, trivially parallelizable. I would like to parallelize my likelihood computation using RcppParallel
.
RcppParallel
requires input of a Rcpp::NumericVector
or Rcpp::NumericMatrix
. I am able to transform between Eigen::MatrixXd
and Rcpp::NumericMatrix
thanks to a very helpful post here. In my example below, I reproduce one of the RcppParallel
examples using this conversion. However, I find the performance hit of doing this appears too severe to make parallel computations worthwhile.
My questions are as follows:
RcppParallel
worker take/operate on an Eigen::MatrixXd
directly?Eigen
-specific matrix algebra?Some further information:
Eigen
is very undesirable.Here is an example from the RcppParallel
documentation, and a modification with Eigen
classes that does compile and run and give the right answer, but incurs a substantial performance hit. My desired outcome is to be able to input and output the Eigen
classed objects, without incurring this performance hit-- either with RcppParallel
or some other method of parallelization.
Note that this case doesn't even cover what I actually want, which is to be able to use Eigen::MatrixXd
etc within the RcppParallel
Worker
. The RcppParallel
documentation appears to explicitly advise against doing this type of conversion within the Worker
: "The code that you write within parallel workers should not call the R or Rcpp API in any fashion. ". Nonetheless, even the below incurs a performance hit, which has me wondering about the overall feasibility of what I would like to do here.
#include <Rcpp.h>
#include <RcppEigen.h>
// [[Rcpp::depends(RcppEigen)]]
using namespace Rcpp;
// [[Rcpp::depends(RcppParallel)]]
#include <RcppParallel.h>
using namespace RcppParallel;
#include <math.h>
// Square root all the elements in a big matrix
struct SquareRoot : public Worker {
const RMatrix<double> input;
RMatrix<double> output;
// Constructor
SquareRoot(const NumericMatrix input, NumericMatrix output) : input(input), output(output) {}
// Call operator
void operator()(std::size_t begin,std::size_t end) {
std::transform(input.begin()+begin,
input.begin()+end,
output.begin()+begin,
(double(*)(double)) sqrt);
}
};
// [[Rcpp::export]]
NumericMatrix parallelMatrixSqrt(NumericMatrix x) {
NumericMatrix output(x.nrow(),x.ncol());
SquareRoot mysqrt(x,output);
parallelFor(0,x.length(),mysqrt,100);
return output;
}
// Try it with an Eigen input and output, compare performance
// [[Rcpp::export]]
Eigen::MatrixXd parallelMatrixSqrtEigen(Eigen::MatrixXd x) {
// Convert to NumericMatrix
SEXP s = wrap(x);
NumericMatrix output(x.rows(),x.cols());
NumericMatrix xR(s);
// Do the parallel computations
SquareRoot mysqrt(xR,output);
parallelFor(0,xR.length(),mysqrt,100);
// Convert back
Eigen::MatrixXd s2 = as<Eigen::MatrixXd>(output);
return s2;
}
Performance:
codepath <- "~/work/projects/misc/rcppparallel"
codefile <- "try-rcppparallel.cpp"
library(Rcpp)
library(RcppParallel)
sourceCpp(file.path(codepath,codefile))
n <- 1000
mm <- matrix(rnorm(n*n)^2,n,n)
microbenchmark::microbenchmark(
sqrt(mm),
parallelMatrixSqrt(mm),
parallelMatrixSqrtEigen(mm)
)
Results:
Unit: microseconds
expr min lq mean median uq max neval cld
sqrt(mm) 1179.242 1294.6775 1692.3123 1384.7955 1464.213 3581.924 100 b
parallelMatrixSqrt(mm) 321.973 496.3665 909.3078 571.0685 784.617 3582.785 100 a
parallelMatrixSqrtEigen(mm) 3223.133 3504.1675 4422.4277 4071.7715 5313.190 6737.735 100 c
A major thank you in advance!
Upvotes: 0
Views: 62