Sebastian
Sebastian

Reputation: 1369

Rcpp: List <-> Matrix conversions by reference?? + Optimizing memory allocation when programming with matrices

As far as I understand it, in Rcpp a Matrix is implemented as a vector with a dimension attribute while a List is a kind of vector of different objects. Thus is there a trick to convert a List of vectors of equal length (i.e. a DataFrame) into a NumericMatrix (or arma::mat for that matter) and vice-versa by reference, that is without copying the data column-by column into the new data structure?

I presume that this is not possible as it would be extremely useful for R users and I'm sure I would have come across it. Under this assumption - which means List and Matrix methods need to be implemented separately - my question then becomes one of how to write Rcpp functions for Lists and Matrices that are equally efficient. My experience is that Lists are more memory efficient because they automatically allocate memory as they are filled, while matrices need to be defined and all memory allocated beforehand. Consider the example below: I have written two versions of a grouped sum - one for matrices and one for Lists / data.frames:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
NumericMatrix gsumm(NumericMatrix x, int ng = 0, IntegerVector g = 0, bool fill = false) {
  int n = x.nrow(); 
  int c = x.ncol(); 
  NumericMatrix sum(ng, c); // here memory needs to be allocated
    for(int j = c; j--; ) { 
      NumericMatrix::Column column = x( _ , j); 
      NumericMatrix::Column sumj = sum( _ , j); 
      for(int i = n; i--; ) sumj[g[i]-1] += column[i];
    }
  if(fill) {
  NumericMatrix out(n, c); // allocating space for this matrix is the main speed-killer
    for(int j = c; j--; ) {
      NumericMatrix::Column colo = out( _ , j); 
      NumericMatrix::Column sumj = sum( _ , j);
      for(int i = n; i--; ) colo[i] = sumj[g[i]-1];
    }
    return out;
  } else return sum;
}

// [[Rcpp::export]]
List gsuml(List x, int ng = 0, IntegerVector g = 0, bool fill = false) {
  int l = x.size(), n;
  List sum(l);
    for(int j = l; j--; ) {
      NumericVector column = x[j];
      n = column.size();
      NumericVector sumj(ng);
      for(int i = n; i--; ) sumj[g[i]-1] += column[i];
      sum[j] = sumj;
    }
    if(fill) for(int j = l; j--; ) {
      NumericVector sgj(n);
      NumericVector sumj = sum[j];
      for(int i = n; i--; ) sgj[i] = sumj[g[i]-1];
      sum[j] = sgj; 
    }
    return sum;
}

if fill = false, the group-aggregated data is returned, while if fill = true, data of the same dimensions is returned where every element was replaced by it's in-group sum. In both cases the list method is faster, especially if fill = true where a full empty n x c matrix needs to be created before filling it up:

library(microbenchmark)
testm = matrix(rnorm(10000000), ncol = 1000)
testl = as.data.frame(testm)
ng = 1000
g = sample.int(ng, 10000, replace = TRUE)

> microbenchmark(gsumm(testm,ng,g, fill = FALSE),gsuml(testl,ng,g, fill = FALSE))
Unit: milliseconds
                              expr      min       lq     mean   median       uq      max neval
 gsumm(testm, ng, g, fill = FALSE) 15.45847 16.28559 17.82400 16.67717 17.41415 63.40689   100
 gsuml(testl, ng, g, fill = FALSE) 13.61055 14.12062 16.06388 14.59342 15.45356 96.93972   100
 cld
   a
   a

> microbenchmark(gsumm(testm,ng,g, fill = TRUE),gsuml(testl,ng,g, fill = TRUE))
Unit: milliseconds
                             expr      min       lq     mean   median       uq      max neval cld
 gsumm(testm, ng, g, fill = TRUE) 34.45835 36.28886 51.42828 39.87513 60.51453 242.2054   100   b
 gsuml(testl, ng, g, fill = TRUE) 29.92314 30.69269 34.83283 31.33239 32.67136 115.8745   100  a 

It would be great if both methods could be brought to the same speed. or potentially even made more efficient. I guess for that a way of programming with matrices that avoids huge apriori memory allocations needs to be devised. I am grateful for any comments and suggestions!

Upvotes: 1

Views: 281

Answers (1)

Ralf Stubner
Ralf Stubner

Reputation: 26833

The amount of memory allocated in both your methods is the same. You can see this from the mem_alloc column when using bench::mark() for benchmarking:

> bench::mark(gsumm(testm,ng,g, fill = FALSE),gsuml(testl,ng,g, fill = FALSE), check = FALSE)
# A tibble: 2 x 13
  expression                           min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result
  <bch:expr>                        <bch:> <bch:>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>
1 gsumm(testm, ng, g, fill = FALSE) 14.1ms 15.1ms      64.7    7.63MB     0       33     0      510ms <dbl …
2 gsuml(testl, ng, g, fill = FALSE) 12.5ms 15.1ms      67.0    7.68MB     4.19    32     2      478ms <list…
# … with 3 more variables: memory <list>, time <list>, gc <list>

> bench::mark(gsumm(testm,ng,g, fill = TRUE),gsuml(testl,ng,g, fill = TRUE), check = FALSE)
# A tibble: 2 x 13
  expression                          min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result
  <bch:expr>                       <bch:> <bch:>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>
1 gsumm(testm, ng, g, fill = TRUE) 39.2ms 45.6ms      20.0    83.9MB     20.0     5     5      250ms <dbl …
2 gsuml(testl, ng, g, fill = TRUE) 30.3ms   32ms      26.7      84MB     20.0     8     6      299ms <list…
# … with 3 more variables: memory <list>, time <list>, gc <list>

However, the memory is not only allocated, which is fast anyway, but also initialized with zero everywhere. This is unnecessary in your case and can be avoided by replacing Rcpp::NumericMatrix mat(rows, cols) with Rcpp::NumericMatrix mat = Rcpp::no_init(rows, cols) as well as Rcpp::NumericVector vec(length) with Rcpp::NumericVector vec = Rcpp::no_init(length). If I do this with your code, both functions profit:

> bench::mark(gsumm(testm,ng,g, fill = FALSE),gsuml(testl,ng,g, fill = FALSE), check = FALSE)
# A tibble: 2 x 13
  expression                           min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result
  <bch:expr>                        <bch:> <bch:>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>
1 gsumm(testm, ng, g, fill = FALSE)   13ms 14.7ms      67.1    7.63MB     0       34     0      507ms <dbl …
2 gsuml(testl, ng, g, fill = FALSE) 12.8ms 14.6ms      67.4    7.68MB     2.04    33     1      489ms <list…
# … with 3 more variables: memory <list>, time <list>, gc <list>

> bench::mark(gsumm(testm,ng,g, fill = TRUE),gsuml(testl,ng,g, fill = TRUE), check = FALSE)
# A tibble: 2 x 13
  expression                          min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result
  <bch:expr>                       <bch:> <bch:>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>
1 gsumm(testm, ng, g, fill = TRUE) 27.5ms   31ms      26.6    83.9MB     10.7    10     4      375ms <dbl …
2 gsuml(testl, ng, g, fill = TRUE) 24.7ms 26.4ms      36.9      84MB     36.9     9     9      244ms <list…
# … with 3 more variables: memory <list>, time <list>, gc <list>

I am not sure why the matrix version profits more from not initializing the memory, though.

Upvotes: 2

Related Questions