Comparing RcppArmadillo and R running speed for non-contiguous submatrix selection

Question

My primary goal is to select non-contiguous submatrix using two sets of binary vectors for rows and columns. This is one of many steps I need to do for my MCMC loop which I'm implementing in C++ using Rcpp, RcppArmadillo and RcppEigen.

Three potential ways to do this was (1)using RcppArmadillo, (2) calling my R function from Rcpp and (3) using R directly and pass the results to C++. Although the last option is not convenient for me at all.

I then compared the performance speed of those three scenarios. Interestingly, the direct R code is much faster than the other two! What is more surpising to me is when I call the exact R function from Rcpp, it is much slower than when I call it directly from R. I expected those to have relatively same running speed as was suggested in an example in this older post.

Anyway, the timing results seem a bit strange to me. Any comments on the reason? I use Macbook Pro with El Capitan OS, 2.5 Ghz Intel Core i7. Could it be related to my system, Mac OSX or the way Rcpp is installed on my machine?

Thanks in advance!

Here is the code:

CPP Section:

#include 

// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
using namespace arma;

// (1) Using RcppArmadillo functions:
// [[Rcpp::export]]
mat subselect(NumericMatrix X, uvec rows, uvec cols){

  mat XX(X.begin(), X.nrow(),X.ncol(), false);
  mat y = XX.submat(find(rows>0),find(cols>0));
  return (y);
}

// (2) Calling the function from R:
// [[Rcpp::export]]
NumericalMatrix subselect2(NumericMatrix X, NumericVector rows, NumericVector cols){ 

  Environment stats;
  Function submat = stats["submat"];
  NumericMatrix outmat=submat(X,rows,cols);
  return(wrap(outmat));
}

R Section:

library(microbenchmark)

# (3) My R function:
submat <- function(mat,rvec,cvec){
 return(mat[as.logical(rvec),as.logical(cvec)])
}

# Comparing the performances:

// Generating data:
set.seed(432)
rows <- rbinom(1000,1,0.1)
cols <- rbinom(1000,1,0.1)
amat <- matrix(1:1e06,1000,1000)

//benchmarking:
microbenchmark(subselect(amat,rows,cols),
           subselect2(amat,rows,cols),
           submat(amat,rows,cols))

Results:

                         expr     min       lq     mean    median       uq        max   neval   

  subselect(amat, rows, cols) 893.670 1566.882 2297.991  1675.282 2184.783   8462.142     100
 subselect2(amat, rows, cols) 928.418 1581.553 3554.805  1657.454 2060.837 138801.050     100
     submat(amat, rows, cols)  36.313   55.748   66.782    62.709   73.975    136.970     100

nrussell · Accepted Answer

There are a couple of things worth addressing here. First, you made a subtle mistake in the design of your benchmark that had a significant impact on the performance on your Armadillo function, subselect. Observe:

set.seed(432)
rows <- rbinom(1000, 1, 0.1)
cols <- rbinom(1000, 1, 0.1)

imat <- matrix(1:1e6, 1000, 1000)
nmat <- imat + 0.0

storage.mode(imat)
# [1] "integer"

storage.mode(nmat)
# [1] "double"

microbenchmark(
    "imat" = subselect(imat, rows, cols),
    "nmat" = subselect(nmat, rows, cols)
)
# Unit: microseconds
#  expr      min       lq      mean    median        uq       max neval
#  imat 3088.140 3218.013 4355.2956 3404.4685 4585.1095 21662.540   100
#  nmat  139.298  167.116  223.2271  209.4585  238.6875   533.035   100

Although R often treats integer literals (e.g. 1, 2, 3, ...) as floating point values, the sequence operator : is one of the few exceptions to this,

storage.mode(c(1, 2, 3, 4, 5))
# [1] "double"

storage.mode(1:5)
# [1] "integer"

which is why the expression matrix(1:1e6, 1000, 1000) returns an integer matrix, not a numeric matrix. This is problematic because subselect is expecting a NumericMatrix, not an IntegerMatrix, and passing it the latter type triggers a deep copy, hence the difference of more than an order of magnitude in the above benchmark.

Second, there is a noticeable difference between the relative performance of the R function submat and the C++ function subselect over the distribution of your binary indexing vectors, which is presumably due to a difference in underlying algorithms. For more sparse indexing (a greater proportion of 0s than 1s), the R function wins out; and for more dense indexing, the opposite is true. This also seems to be a function of matrix size (or possibly just dimension), as shown in the plots below, where row and column index vectors are generated using rbinom with success parameters 0.0, 0.05, 0.10, ..., 0.95, 1.0 -- first with a 1e3 x 1e3 matrix, and then with a 1e3 x 1e4 matrix. The code for this is included at the end.

Benchmark code:

library(data.table)
library(microbenchmark)
library(ggplot2)

test_data <- function(nr, nc, p, seed = 123) {
    set.seed(seed)
    list(
        x = matrix(rnorm(nr * nc), nr, nc),
        rv = rbinom(nr, 1, p),
        cv = rbinom(nc, 1, p)
    )
}

tests <- lapply(seq(0, 1, 0.05), function(p) {
    lst <- test_data(1e3, 1e3, p)
    list(
        p = p,
        benchmark = microbenchmark::microbenchmark(
            R = submat(lst[[1]], lst[[2]], lst[[3]]),
            Arma = subselect(lst[[1]], lst[[2]], lst[[3]])
        )
    )
})

gt <- rbindlist(
    Map(function(g) {
        data.table(g[[2]])[
            ,.(Median.us = median(time / 1000)), 
            by = .(Expr = expr)
        ][order(Median.us)][
            ,Relative := Median.us / min(Median.us)
        ][,pSuccess := sprintf("%3.2f", g[[1]])]
    }, tests)
)

ggplot(gt) +
    geom_point(
        aes(
            x = pSuccess, 
            y = Relative, 
            color = Expr
        ),
        size = 2,
        alpha = 0.75
    ) +
    theme_bw() +
    ggtitle("1e3 x 1e3 Matrix")

## change `test_data(1e3, 1e3, p)` to
## `test_data(1e3, 1e4, p)` inside of 
## `tests <- lapply(...) ...` to generate 
## the second plot

Comparing RcppArmadillo and R running speed for non-contiguous submatrix selection

Answers (1)

Related Questions