Reputation: 1369
As far as I understand it, in Rcpp a Matrix is implemented as a vector with a dimension attribute while a List is a kind of vector of different objects. Thus is there a trick to convert a List of vectors of equal length (i.e. a DataFrame) into a NumericMatrix (or arma::mat for that matter) and vice-versa by reference, that is without copying the data column-by column into the new data structure?
I presume that this is not possible as it would be extremely useful for R users and I'm sure I would have come across it. Under this assumption - which means List and Matrix methods need to be implemented separately - my question then becomes one of how to write Rcpp functions for Lists and Matrices that are equally efficient. My experience is that Lists are more memory efficient because they automatically allocate memory as they are filled, while matrices need to be defined and all memory allocated beforehand. Consider the example below: I have written two versions of a grouped sum - one for matrices and one for Lists / data.frames:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericMatrix gsumm(NumericMatrix x, int ng = 0, IntegerVector g = 0, bool fill = false) {
int n = x.nrow();
int c = x.ncol();
NumericMatrix sum(ng, c); // here memory needs to be allocated
for(int j = c; j--; ) {
NumericMatrix::Column column = x( _ , j);
NumericMatrix::Column sumj = sum( _ , j);
for(int i = n; i--; ) sumj[g[i]-1] += column[i];
}
if(fill) {
NumericMatrix out(n, c); // allocating space for this matrix is the main speed-killer
for(int j = c; j--; ) {
NumericMatrix::Column colo = out( _ , j);
NumericMatrix::Column sumj = sum( _ , j);
for(int i = n; i--; ) colo[i] = sumj[g[i]-1];
}
return out;
} else return sum;
}
// [[Rcpp::export]]
List gsuml(List x, int ng = 0, IntegerVector g = 0, bool fill = false) {
int l = x.size(), n;
List sum(l);
for(int j = l; j--; ) {
NumericVector column = x[j];
n = column.size();
NumericVector sumj(ng);
for(int i = n; i--; ) sumj[g[i]-1] += column[i];
sum[j] = sumj;
}
if(fill) for(int j = l; j--; ) {
NumericVector sgj(n);
NumericVector sumj = sum[j];
for(int i = n; i--; ) sgj[i] = sumj[g[i]-1];
sum[j] = sgj;
}
return sum;
}
if fill = false
, the group-aggregated data is returned, while if fill = true
, data of the same dimensions is returned where every element was replaced by it's in-group sum. In both cases the list method is faster, especially if fill = true
where a full empty n x c matrix needs to be created before filling it up:
library(microbenchmark)
testm = matrix(rnorm(10000000), ncol = 1000)
testl = as.data.frame(testm)
ng = 1000
g = sample.int(ng, 10000, replace = TRUE)
> microbenchmark(gsumm(testm,ng,g, fill = FALSE),gsuml(testl,ng,g, fill = FALSE))
Unit: milliseconds
expr min lq mean median uq max neval
gsumm(testm, ng, g, fill = FALSE) 15.45847 16.28559 17.82400 16.67717 17.41415 63.40689 100
gsuml(testl, ng, g, fill = FALSE) 13.61055 14.12062 16.06388 14.59342 15.45356 96.93972 100
cld
a
a
> microbenchmark(gsumm(testm,ng,g, fill = TRUE),gsuml(testl,ng,g, fill = TRUE))
Unit: milliseconds
expr min lq mean median uq max neval cld
gsumm(testm, ng, g, fill = TRUE) 34.45835 36.28886 51.42828 39.87513 60.51453 242.2054 100 b
gsuml(testl, ng, g, fill = TRUE) 29.92314 30.69269 34.83283 31.33239 32.67136 115.8745 100 a
It would be great if both methods could be brought to the same speed. or potentially even made more efficient. I guess for that a way of programming with matrices that avoids huge apriori memory allocations needs to be devised. I am grateful for any comments and suggestions!
Upvotes: 1
Views: 281
Reputation: 26833
The amount of memory allocated in both your methods is the same. You can see this from the mem_alloc
column when using bench::mark()
for benchmarking:
> bench::mark(gsumm(testm,ng,g, fill = FALSE),gsuml(testl,ng,g, fill = FALSE), check = FALSE)
# A tibble: 2 x 13
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result
<bch:expr> <bch:> <bch:> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list>
1 gsumm(testm, ng, g, fill = FALSE) 14.1ms 15.1ms 64.7 7.63MB 0 33 0 510ms <dbl …
2 gsuml(testl, ng, g, fill = FALSE) 12.5ms 15.1ms 67.0 7.68MB 4.19 32 2 478ms <list…
# … with 3 more variables: memory <list>, time <list>, gc <list>
> bench::mark(gsumm(testm,ng,g, fill = TRUE),gsuml(testl,ng,g, fill = TRUE), check = FALSE)
# A tibble: 2 x 13
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result
<bch:expr> <bch:> <bch:> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list>
1 gsumm(testm, ng, g, fill = TRUE) 39.2ms 45.6ms 20.0 83.9MB 20.0 5 5 250ms <dbl …
2 gsuml(testl, ng, g, fill = TRUE) 30.3ms 32ms 26.7 84MB 20.0 8 6 299ms <list…
# … with 3 more variables: memory <list>, time <list>, gc <list>
However, the memory is not only allocated, which is fast anyway, but also initialized with zero everywhere. This is unnecessary in your case and can be avoided by replacing Rcpp::NumericMatrix mat(rows, cols)
with Rcpp::NumericMatrix mat = Rcpp::no_init(rows, cols)
as well as Rcpp::NumericVector vec(length)
with Rcpp::NumericVector vec = Rcpp::no_init(length)
. If I do this with your code, both functions profit:
> bench::mark(gsumm(testm,ng,g, fill = FALSE),gsuml(testl,ng,g, fill = FALSE), check = FALSE)
# A tibble: 2 x 13
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result
<bch:expr> <bch:> <bch:> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list>
1 gsumm(testm, ng, g, fill = FALSE) 13ms 14.7ms 67.1 7.63MB 0 34 0 507ms <dbl …
2 gsuml(testl, ng, g, fill = FALSE) 12.8ms 14.6ms 67.4 7.68MB 2.04 33 1 489ms <list…
# … with 3 more variables: memory <list>, time <list>, gc <list>
> bench::mark(gsumm(testm,ng,g, fill = TRUE),gsuml(testl,ng,g, fill = TRUE), check = FALSE)
# A tibble: 2 x 13
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result
<bch:expr> <bch:> <bch:> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list>
1 gsumm(testm, ng, g, fill = TRUE) 27.5ms 31ms 26.6 83.9MB 10.7 10 4 375ms <dbl …
2 gsuml(testl, ng, g, fill = TRUE) 24.7ms 26.4ms 36.9 84MB 36.9 9 9 244ms <list…
# … with 3 more variables: memory <list>, time <list>, gc <list>
I am not sure why the matrix version profits more from not initializing the memory, though.
Upvotes: 2