Reputation: 11
To accelerate my package, which include plenty of matrix calculation, i use Rcpp to rewrite all the code. However, some functions are even slower than before. I use microbenchmark to analyze, and find the the matrix multiplication in Rcpp is slower. Why this will happen? And how to accelerate my package? Thanks a lot. The Rcpp code is as follows:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericMatrix mmult(const NumericMatrix& a, const NumericMatrix& b){
if (a.ncol() != b.nrow()) stop ("Incompatible matrix dimensions");
NumericMatrix out(a.nrow(),b.ncol());
NumericVector rm1, cm2;
for (int i = 0; i < a.nrow(); ++i) {
rm1 = a(i,_);
for (int j = 0; j < b.ncol(); ++j) {
cm2 = b(_,j);
out(i,j) = std::inner_product(rm1.begin(), rm1.end(), cm2.begin(), 0.);
}
}
return out;}
The R code is as follows:
X = matrix(rnorm(10*10,1),10,10)
Y = matrix(rnorm(10*10,1),10,10)
microbenchmark(
mmult(X,Y),
X%*%Y)
The result is:
Unit: microseconds
expr min lq mean median uq max neval
mmult(X, Y) 45.720 48.9860 126.79228 50.385 51.785 6368.512 100
X %*% Y 5.599 8.8645 12.85787 9.798 10.730 153.486 100
Upvotes: 0
Views: 283
Reputation: 26823
This is the opposite but expected result from what was seen for matrix-vector multiplication. Here R is using BLAS to do all the heavy work, which might even work in parallel. You are throwing away all the optimized memory management done in the BLAS library by using your naive matrix multiplication.
Instead of trying to reinvent the low-level stuff like matrix multiplication, you could try to implement larger parts of your code using something like RcppArmadillo, which uses the same BLAS library as R but also (not only!) offers a convenient syntax on top of that.
Upvotes: 3