VeilleData
VeilleData

Reputation: 285

R data.table memory efficient rbindlist

I'd like to rbind multiple datatables in a memory efficient way.

More precisely, I'd like to rbind them one by one, and free memory on the go, so that I can join n data.tables of size k when my memory is only of size (n+1)*k.

I wrote this function hoping to do that :

rbindlistOneByOne <- function(l, use.names=FALSE, fill=FALSE, idcol=NULL, verbose = F) {
  ll <- length(l)
  # Handle empty lists
  if(ll <= 0) stop("rbindlistOneByOne : empty list")
  if(ll <= 1) return(l[[1]])
  # Handle normal lists (ll > 2)
  current <- l[[1]]
  res <- current
  l[1] <- NULL
  rm(current); gc()
  for(i in 2:ll) {
    current <- l[[1]]
    res <- rbindlist(list(res, current), use.names = use.names, fill = fill, idcol = idcol)
    l[1] <- NULL
    rm(current); gc()
  }
  return(res)
}

Now the problem is that this function is not memory efficient, even though I thought it would be.

Do you know why ? Is that because rm does not free memory, and that the data.table called "current" remains in memory ?

Upvotes: 3

Views: 1100

Answers (1)

JRR
JRR

Reputation: 3223

There is no way to do what you want to do. Memory release is stochastic in R you can't control it. The use of gc() may or may not release memory and it is not under user's control.

From http://adv-r.had.co.nz/memory.html :

Despite what you might have read elsewhere, there’s never any need to call gc() yourself. R will automatically run garbage collection whenever it needs more space; if you want to see when that is, call gcinfo(TRUE). The only reason you might want to call gc() is to ask R to return memory to the operating system. However, even that might not have any effect: older versions of Windows had no way for a program to return memory to the OS.

In addition calling gc is extremely slow. Here a bechmark of your function with and without calling gc for a list of 1000 tables of 10 lines

  • without gc : 8 ms
  • with gc : 7 s

rbindlist is the most efficient way to bind data.table

Upvotes: 1

Related Questions