why Rcpp is slower than R when using matrix multiply?

Question

To accelerate my package, which include plenty of matrix calculation, i use Rcpp to rewrite all the code. However, some functions are even slower than before. I use microbenchmark to analyze, and find the the matrix multiplication in Rcpp is slower. Why this will happen? And how to accelerate my package? Thanks a lot. The Rcpp code is as follows:

#include 
using namespace Rcpp;

// [[Rcpp::export]]
NumericMatrix mmult(const NumericMatrix& a, const NumericMatrix& b){
if (a.ncol() != b.nrow()) stop ("Incompatible matrix dimensions");
NumericMatrix out(a.nrow(),b.ncol());
NumericVector rm1, cm2;
for (int i = 0; i < a.nrow(); ++i) {
  rm1 = a(i,_);
  for (int j = 0; j < b.ncol(); ++j) {
    cm2 = b(_,j);
    out(i,j) = std::inner_product(rm1.begin(), rm1.end(), cm2.begin(), 0.);
  }
}
return out;}

The R code is as follows:

X = matrix(rnorm(10*10,1),10,10)
Y = matrix(rnorm(10*10,1),10,10)


microbenchmark(
  mmult(X,Y),
  X%*%Y)

The result is:

Unit: microseconds
    expr    min      lq      mean median     uq      max neval
 mmult(X, Y) 45.720 48.9860 126.79228 50.385 51.785 6368.512   100
 X %*% Y  5.599  8.8645  12.85787  9.798 10.730  153.486   100

Ralf Stubner · Accepted Answer

This is the opposite but expected result from what was seen for matrix-vector multiplication. Here R is using BLAS to do all the heavy work, which might even work in parallel. You are throwing away all the optimized memory management done in the BLAS library by using your naive matrix multiplication.

Instead of trying to reinvent the low-level stuff like matrix multiplication, you could try to implement larger parts of your code using something like RcppArmadillo, which uses the same BLAS library as R but also (not only!) offers a convenient syntax on top of that.

why Rcpp is slower than R when using matrix multiply?

Answers (1)

Related Questions