guy
guy

Reputation: 436

Memory leak in third party library in R

I am currently doing a simulation experiment in R using a third party package (the package is iRF but in principle it doesn't matter what the package is) which appears to have a memory leak. A small example reproducing the problem is:

library(zeallot)
library(iRF)

simulate_data <- function() {
  X <- matrix(runif(300 * 50), nrow = 300)
  Y <- X[,1] + X[,2] + rnorm(nrow(X))
  return(list(X = X, Y = Y))
}

for(i in 1:10) {
  c(X, Y) %<-% simulate_data()
  fit <- iRF(X, Y)
  rm(fit)
  gc()
}

This uses just over 1Gb of ram. The package in question makes use of compiled C code, and presumably the memory leak is occuring there; hence, I cannot straight-forwardly free the memory in R. The question is: is there any way to get around this memory leak without restarting my R session? I'm not sure if this makes sense (I'm an ignorant statistician) but is there some way to nuke everything in the C world as though I reset the session? It is extremely inconvenient that if I want to replicate the experiment 1000 times I will have to reboot R or run out of memory.

Upvotes: 1

Views: 281

Answers (2)

guy
guy

Reputation: 436

Following @r2evans advice, the issue can be bypassed through the use of parallel. The following code does not suffer from the memory leak:

library(zeallot)
library(iRF)
library(parallel)

simulate_data <- function() {
  X <- matrix(runif(300 * 50), nrow = 300)
  Y <- X[,1] + X[,2] + rnorm(nrow(X))
  return(list(X = X, Y = Y))
}

f <- function(i) {
  c(X, Y) %<-% simulate_data()
  return(iRF(X, Y))
}

for(i in 1:10) {
  cl <- makeCluster(1, "FORK")
  fit <- clusterApply(cl, 1, f)[[1]]
  stopCluster(cl)
}

Upvotes: 1

r2evans
r2evans

Reputation: 160437

If you cannot fix the source, then your only option is to contain the problem. If the calculations can be broken into smaller components, you have a few options

  1. calculate what you can, save into .rda files, restart R, continue; or

  2. use a parallelization scheme such as future or parallel::parLapplyLB to farm out the processing into subordinate R sessions, capture the output, and allow the child processes to close.

Upvotes: 1

Related Questions