slowowl
slowowl

Reputation: 694

Memory efficient way to use sapply() in R

I am trying to reduce the memory consumption of a piece of R code I have been working on. I am using the peakRAM() function to measure the maximum RAM used. It is a long code and there is a simple sapply() function at the end of it. I figured out that it is the sapply() part which is consuming the maximum memory. So I have written a small function fun1() imitating the objects and the sapply() function from that part of my code, which is as follows :

library(peakRAM)
fun1 <- function() {
  tm <- matrix(1, nrow = 300, ncol = 10)  #in the original code, the entries are different and nonzero
  print(object.size(tm))
  r <- sapply(1:20000, function(i) {
        colSums(tm[1:200,])  #in the original code, I am subsetting a 200 length vector which varies with i, stored in a list of length 20000
        })
  print(object.size(r))
  r
}

peakRAM(fun1())

If you run this in R, you get a peakRAM() consumption of around 330Mb. But you can see that the two objects tm and r are both of very small size (2Kb and 1.6Mb respectively) and if you look at the peakRAM() for computing a single colSums(tm[1:200,]), it is very small, like 0.1Mb. So it feels like, during sapply(), R is probably not getting rid of the memory while looping over 1:20000. Otherwise, since a single colSums(tm[1:200,]) takes very small memory, and all the objects associated are of small memory, the sapply() should have taken small memory.

In this regard, I already know that R has a gc() function which gets rid of unnecessary memory when needed and probably R is not clearing memory during sapply() which is resulting into this high memory consumption. If that is true, I would like to know if there is a way to get rid of this and complete the job without requiring this much extra memory? Note that, I do not wish to compromise on the run-time for doing that.

Upvotes: 3

Views: 309

Answers (1)

Mikael Jagan
Mikael Jagan

Reputation: 11306

Here is your function, modified to use vapply instead of sapply and .colSums instead of colSums:

f1 <- function(x, l) {
    n <- ncol(x)
    FUN <- function(i) .colSums(x[i, , drop = FALSE], length(i), n)
    vapply(l, FUN, double(n), USE.NAMES = FALSE)
}

And here is a C implementation, made accessible to R via the inline package:

sig <- c(x = "double", l = "list")
bod <- '
double *px = REAL(x);
R_xlen_t nx = XLENGTH(x);
int *d = INTEGER(getAttrib(x, R_DimSymbol));
int m = d[0];
int n = d[1];
R_xlen_t N = XLENGTH(l);

SEXP res = PROTECT(allocMatrix(REALSXP, n, N));
double *pres = REAL(res);

SEXP index;
R_xlen_t nindex;
int *pindex;
double sum;

for (R_xlen_t i = 0, rpos = 0; i < N; ++i)
{
    index = VECTOR_ELT(l, i);
    nindex = XLENGTH(index);
    pindex = INTEGER(index);
    for (R_xlen_t xpos = 0; xpos < nx; xpos += m, ++rpos)
    {
        sum = 0.0;
        for (R_xlen_t k = 0; k < nindex; ++k)
        {
            sum += px[xpos + pindex[k] - 1];
        }
        pres[rpos] = sum;
    }
}
UNPROTECT(1);
return res;
'
f2 <- inline::cfunction(sig, bod, language = "C")

The C code here is quite minimal, so I have stuck to the R API. You can write equivalent C++ code using the Rcpp API, which you might find more approachable.

Here is a test showing that f1 and f2 give identical results:

set.seed(1L)
m <- 300L
n <- 10L
x <- matrix(rnorm(m * n), m, n)
l <- replicate(2e+04, sample(m, size = 200L, replace = TRUE), simplify = FALSE)
identical(f1(x, l), f2(x, l))
## [1] TRUE

And here is the result of profiling f1(x, l) and f2(x, l) on my machine:

gc(FALSE)
Rprof("f.out", interval = 1e-05, memory.profiling = TRUE)
f1(x, l)
f2(x, l)
Rprof(NULL)
summaryRprof("f.out", memory = "both")[["by.total"]][c("\"f1\"", "\"f2\""), c("total.time", "mem.total")]
     total.time mem.total
"f1"      0.119     344.4
"f2"      0.001       1.5

The f1 call takes 0.119 seconds and consumes 344.4 MiB of memory. The f2 call takes 0.001 seconds and consumes 1.5 MiB of memory—which is more or less the size of the return value. (Interpret these results with care: Rprof comes with a number of caveats.)

Upvotes: 4

Related Questions