Reputation: 1641
It is well known that implementations in Rcpp will, in general, be much faster than implementations in R. I am interested in whether there are good practices to speed up single evaluations of Rcpp functions that have to be evaluated within a R loop.
Consider the following example where I use a simple multivariate normal generating function in Rcpp:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace arma;
using namespace Rcpp;
// [[Rcpp::export]]
mat mvrnormArma(int n, mat sigma) {
int ncols = sigma.n_cols;
mat Y = randn(n, ncols);
return Y * chol(sigma);
}
Assume the goal is to generate 10,000 10-dimensional multivariate normal variables using the following two functions:
PureRcpp = function(n){mvrnormArma(n, diag(10))}
LoopRcpp = function(n){for(ii in 1:n){mvrnormArma(1, diag(10))}}
Here, PureRcpp
is of course the preferable and much faster solution. However, in certain applications it may be necessary to rely on single evaluations of mvrnormArma
in an R loop. This is the approach taken in LoopRcpp
which is certainly the slower solution. I was, however, a bit surprised when i benchmarked these and saw HOW slow the second solution really was:
> microbenchmark::microbenchmark(PureRcpp(10000), LoopRcpp(10000))
Unit: milliseconds
expr min lq mean median uq max neval cld
PureRcpp(10000) 2.236624 2.365988 2.578869 2.435268 2.565488 10.79609 100 a
LoopRcpp(10000) 52.590143 53.315655 58.080897 55.406020 62.264711 80.96275 100 b
Is this massive slow-down just something that we have to live with when we have to work in R loops or are there some possibilities to reduce the overhead that results due to looping? I'm aware that we could just rewrite everything in C++, but the goal is to offer a speedy ''Rcpp within R loop'' solution if possible.
Upvotes: 1
Views: 100
Reputation: 46
As Roland pointed out that is mostly due to function calls. However, you can shave off some time (and get a more accurate comparison) by optimising/adapting your code.
// [[Rcpp::export]]
mat draw_randn(int n, int ncols) {
mat Y = randn(n, ncols);
return(Y);
}
// [[Rcpp::export]]
mat mvrnormArma(mat sigma, mat Y) {
return Y * chol(sigma);
}
// [[Rcpp::export]]
mat mvrnormArma_loop(mat& sigma, rowvec& Y) {
return Y * chol(sigma);
}
And benchmark that.
PureRcpp = function(n) {
Y <- draw_randn(n, 10)
I <- diag(10)
mvrnormArma(I, Y)
}
LoopRcpp = function(n) {
Y <- draw_randn(n, 10)
I <- diag(10)
for(ii in 1:n) {mvrnormArma_loop(I, Y[ii, ])}
}
Shaves off about 10ms for me.
Upvotes: 3