Reputation: 519
The solution is now online in the Rcpp Gallery
I re-implemented dmvnorm from the mvtnorm package in RcppArmadillo. I somehow like Armadillo, but I guess it would also work in plain Rcpp. The approach from dmvnorm is based on the mahalanobis distance, so I have a function for that and then the multivariate normal density function.
Let me show you my code:
#include <RcppArmadillo.h>
#include <Rcpp.h>
// [[Rcpp::depends("RcppArmadillo")]]
// [[Rcpp::export]]
arma::vec mahalanobis_arma( arma::mat x , arma::mat mu, arma::mat sigma ){
int n = x.n_rows;
arma::vec md(n);
for (int i=0; i<n; i++){
arma::mat x_i = x.row(i) - mu;
arma::mat Y = arma::solve( sigma, arma::trans(x_i) );
md(i) = arma::as_scalar(x_i * Y);
}
return md;
}
// [[Rcpp::export]]
arma::vec dmvnorm ( arma::mat x, arma::mat mean, arma::mat sigma, bool log){
arma::vec distval = mahalanobis_arma(x, mean, sigma);
double logdet = sum(arma::log(arma::eig_sym(sigma)));
double log2pi = 1.8378770664093454835606594728112352797227949472755668;
arma::vec logretval = -( (x.n_cols * log2pi + logdet + distval)/2 ) ;
if(log){
return(logretval);
}else {
return(exp(logretval));
}
}
So, and not to my big disappointment:
simulate some data
sigma <- matrix(c(4,2,2,3), ncol=2)
x <- rmvnorm(n=5000000, mean=c(1,2), sigma=sigma, method="chol")
and benchmark
system.time(mvtnorm::dmvnorm(x,t(1:2),.2+diag(2),F))
user system elapsed
0.05 0.02 0.06
system.time(dmvnorm(x,t(1:2),.2+diag(2),F))
user system elapsed
0.12 0.02 0.14
No!!!!!! :-(
[EDIT]
The questions are: 1) Why is the RcppArmadillo implementation slower than a plain R implementation? 2) How do I create an Rcpp/RcppArmadillo implementation that beats the R implementation?
[EDIT 2]
I put in the mahalanobis_arma into the mvtnorm::dmvnorm function and it also slows down.
Upvotes: 3
Views: 1682
Reputation: 18437
If you want a faster implementation of the mahalanobis distance, you just have to re-write your algorithm and mimic the one used by R. It's pretty straightforward
I modified a little bit your function mahalanobis_arma
to turn mu
to a rowvec
.
Basically I just translated the R code to RcppArmadillo
mahalanobis
function (x, center, cov, inverted = FALSE, ...)
{
x <- if (is.vector(x))
matrix(x, ncol = length(x))
else as.matrix(x)
x <- sweep(x, 2, center)
if (!inverted)
cov <- solve(cov, ...)
setNames(rowSums((x %*% cov) * x), rownames(x))
}
<bytecode: 0x6e5b408>
<environment: namespace:stats>
Here it is
#include <RcppArmadillo.h>
#include <Rcpp.h>
// [[Rcpp::depends("RcppArmadillo")]]
// [[Rcpp::export]]
arma::vec Mahalanobis(arma::mat x, arma::rowvec center, arma::mat cov){
int n = x.n_rows;
arma::mat x_cen;
x_cen.copy_size(x);
for (int i=0; i < n; i++) {
x_cen.row(i) = x.row(i) - center;
}
return sum((x_cen * cov.i()) % x_cen, 1);
}
// [[Rcpp::export]]
arma::vec mahalanobis_arma( arma::mat x , arma::rowvec mu, arma::mat sigma ){
int n = x.n_rows;
arma::vec md(n);
for (int i=0; i<n; i++){
arma::mat x_i = x.row(i) - mu;
arma::mat Y = arma::solve( sigma, arma::trans(x_i) );
md(i) = arma::as_scalar(x_i * Y);
}
return md;
}
Now, let's compare this new armadillo version (Mahalanobis
), your first version (mahalanobis_arma
) and the R implementation (mahalanobis
).
I save this Cpp code as mahalanobis.cpp
require(RcppArmadillo)
sourceCpp("mahalanobis.cpp")
set.seed(1)
x <- matrix(rnorm(10000 * 10), ncol = 10)
Sx <- cov(x)
all.equal(c(Mahalanobis(x, colMeans(x), Sx))
,mahalanobis(x, colMeans(x), Sx))
## [1] TRUE
all.equal(mahalanobis_arma(x, colMeans(x), Sx)
,Mahalanobis(x, colMeans(x), Sx))
## [1] TRUE
require(rbenchmark)
benchmark(Mahalanobis(x, colMeans(x), Sx),
mahalanobis(x, colMeans(x), Sx),
mahalanobis_arma(x, colMeans(x), Sx),
order = "elapsed")
## test replications elapsed
## 1 Mahalanobis(x, colMeans(x), Sx) 100 0.124
## 2 mahalanobis(x, colMeans(x), Sx) 100 0.741
## 3 mahalanobis_arma(x, colMeans(x), Sx) 100 4.509
## relative user.self sys.self user.child sys.child
## 1 1.000 0.173 0.077 0 0
## 2 5.976 0.804 0.670 0 0
## 3 36.363 4.386 4.626 0 0
As you can see the new implementation is faster than the R one. I'm pretty sure that we can do better here by using cholesky decomposition to solve the covariance matrix or by using other matrix decomposition.
Finally, we can just plug this Mahalanobis
function into your dmvnorm
and test it :
require(mvtnorm)
set.seed(1)
sigma <- matrix(c(4, 2, 2, 3), ncol = 2)
x <- rmvnorm(n = 5000000, mean = c(1, 2), sigma = sigma, method = "chol")
all.equal(mvtnorm::dmvnorm(x, t(1:2), .2 + diag(2), FALSE),
c(dmvnorm(x, t(1:2), .2+diag(2), FALSE)))
## [1] TRUE
benchmark(mvtnorm::dmvnorm(x, t(1:2), .2 + diag(2), FALSE),
dmvnorm(x, t(1:2), .2+diag(2), FALSE),
order = "elapsed")
## test replications
## 2 dmvnorm(x, t(1:2), 0.2 + diag(2), FALSE) 100
## 1 mvtnorm::dmvnorm(x, t(1:2), 0.2 + diag(2), FALSE) 100
## elapsed relative user.self sys.self user.child sys.child
## 2 35.366 1.000 31.117 4.193 0 0
## 1 60.770 1.718 56.666 13.236 0 0
It almost twice as fast now.
Upvotes: 8