dmvnorm MVN density - RcppArmadillo implementation slower than R package including a bit of Fortran

Question

The solution is now online in the Rcpp Gallery

I re-implemented dmvnorm from the mvtnorm package in RcppArmadillo. I somehow like Armadillo, but I guess it would also work in plain Rcpp. The approach from dmvnorm is based on the mahalanobis distance, so I have a function for that and then the multivariate normal density function.

Let me show you my code:

#include 
#include 

// [[Rcpp::depends("RcppArmadillo")]]

// [[Rcpp::export]]
arma::vec mahalanobis_arma( arma::mat x ,  arma::mat mu, arma::mat sigma ){

  int n = x.n_rows;
  arma::vec md(n);
    for (int i=0; i



So, and not to my big disappointment:

simulate some data

sigma <- matrix(c(4,2,2,3), ncol=2)
x <- rmvnorm(n=5000000, mean=c(1,2), sigma=sigma, method="chol")


and benchmark 

system.time(mvtnorm::dmvnorm(x,t(1:2),.2+diag(2),F))
   user  system elapsed 
   0.05    0.02    0.06 

system.time(dmvnorm(x,t(1:2),.2+diag(2),F))
   user  system elapsed 
   0.12    0.02    0.14 


No!!!!!! :-(

[EDIT]

The questions are:
1) Why is the RcppArmadillo implementation slower than a plain R implementation?
2) How do I create an Rcpp/RcppArmadillo implementation that beats the R implementation?

[EDIT 2]

I put in the mahalanobis_arma into the mvtnorm::dmvnorm function and it also slows down.

dickoa · Accepted Answer

If you want a faster implementation of the mahalanobis distance, you just have to re-write your algorithm and mimic the one used by R. It's pretty straightforward

I modified a little bit your function mahalanobis_arma to turn mu to a rowvec.

Basically I just translated the R code to RcppArmadillo

mahalanobis
function (x, center, cov, inverted = FALSE, ...) 
{
    x <- if (is.vector(x)) 
        matrix(x, ncol = length(x))
    else as.matrix(x)
    x <- sweep(x, 2, center)
    if (!inverted) 
        cov <- solve(cov, ...)
    setNames(rowSums((x %*% cov) * x), rownames(x))
}

Here it is

#include 
#include 

// [[Rcpp::depends("RcppArmadillo")]]
// [[Rcpp::export]]
arma::vec Mahalanobis(arma::mat x, arma::rowvec center, arma::mat cov){
    int n = x.n_rows;
    arma::mat x_cen;
    x_cen.copy_size(x);
    for (int i=0; i < n; i++) {
        x_cen.row(i) = x.row(i) - center;
    }
    return sum((x_cen * cov.i()) % x_cen, 1);    
}


// [[Rcpp::export]]
arma::vec mahalanobis_arma( arma::mat x ,  arma::rowvec mu, arma::mat sigma ){

  int n = x.n_rows;
  arma::vec md(n);
    for (int i=0; i



Now, let's compare this new armadillo version (Mahalanobis), your first version (mahalanobis_arma) and the R implementation (mahalanobis). 

I save this Cpp code as mahalanobis.cpp

require(RcppArmadillo)
sourceCpp("mahalanobis.cpp")

set.seed(1)
x <- matrix(rnorm(10000 * 10), ncol = 10)
Sx <- cov(x)


all.equal(c(Mahalanobis(x, colMeans(x), Sx))
          ,mahalanobis(x, colMeans(x), Sx))
## [1] TRUE

all.equal(mahalanobis_arma(x, colMeans(x), Sx)
          ,Mahalanobis(x, colMeans(x), Sx))
## [1] TRUE


require(rbenchmark)
benchmark(Mahalanobis(x, colMeans(x), Sx),
          mahalanobis(x, colMeans(x), Sx),
          mahalanobis_arma(x, colMeans(x), Sx),
          order = "elapsed")


##                                   test replications elapsed
## 1      Mahalanobis(x, colMeans(x), Sx)          100   0.124
## 2      mahalanobis(x, colMeans(x), Sx)          100   0.741
## 3 mahalanobis_arma(x, colMeans(x), Sx)          100   4.509
##   relative user.self sys.self user.child sys.child
## 1    1.000     0.173    0.077          0         0
## 2    5.976     0.804    0.670          0         0
## 3   36.363     4.386    4.626          0         0


As you can see the new implementation is faster than the R one. 
I'm pretty sure that we can do better here by using cholesky decomposition to solve the covariance matrix or by using other matrix decomposition.

Finally, we can just plug this Mahalanobis function into your dmvnorm and test it :

require(mvtnorm)
set.seed(1)
sigma <- matrix(c(4, 2, 2, 3), ncol = 2)
x <- rmvnorm(n = 5000000, mean = c(1, 2), sigma = sigma, method = "chol")


all.equal(mvtnorm::dmvnorm(x, t(1:2), .2 + diag(2), FALSE),
          c(dmvnorm(x, t(1:2), .2+diag(2), FALSE)))
## [1] TRUE

benchmark(mvtnorm::dmvnorm(x, t(1:2), .2 + diag(2), FALSE),
          dmvnorm(x, t(1:2), .2+diag(2), FALSE),
          order = "elapsed")

##                                                test replications
## 2          dmvnorm(x, t(1:2), 0.2 + diag(2), FALSE)          100
## 1 mvtnorm::dmvnorm(x, t(1:2), 0.2 + diag(2), FALSE)          100
##   elapsed relative user.self sys.self user.child sys.child
## 2  35.366    1.000    31.117    4.193          0         0
## 1  60.770    1.718    56.666   13.236          0         0


It almost twice as fast now.

dmvnorm MVN density - RcppArmadillo implementation slower than R package including a bit of Fortran

Answers (1)

Related Questions