Vanja
Vanja

Reputation: 21

R: importing data.table package namespace, unexplainable jump in memory consumption

I use data.table package inside my own package and I import data.table namespace in NAMESPACE and DESCRIPTION files. In one of my functions I use data.table function to convert data.frame into data.table

dt <- data.table(df)

But when I call my function, at the point of calling data.table() memory usage jumps instantly and R just stops responding. The code within the function works fine when I run it line by line and with low memory consumption. Also, if I put library(data.table) within my function everything is fine. I was trying to avoid putting library(data.table) in my function and declare dependency instead. However, it seems something is going wrong that way. I am running R-2.14.0 on Mac OS X 10.6.8

Can anybody explain what could be a reason, and how can I fix that (without using library(data.table) within my function)?

Upvotes: 2

Views: 2915

Answers (1)

Matt Dowle
Matt Dowle

Reputation: 59612

Some random guesses in no particular order :

Try use the Imports or Depends field in DESCRIPTION only. I don't think you need to import in NAMESPACE as well, but I might be wrong. Why that would explain the memory use though, don't know.

What is df? Is it big or somehow recursive or strange in some way? Please provide str(df) to tell us something about it, if possible.

Try as.data.table(df) which is faster than data.table(df). But it sounds like your problem is different to that.

Is your function call being called repeatedly? I can see why repeatedly converting df to dt would use up memory, but not why just calling library(data.table) would make that fast.

Try starting R with R --vanilla to ensure no .Rdata (which may include functions masking data.table's) is being loaded on startup, amongst other things. If you have developed your own package then some kind of function name conflict, or the order of packages on the search() path sounds plausible.

Otherwise we'll need more information please. I don't recall anything similar to this happening to me, or being reported before.

And, which version of data.table are you using? There is this bug fix in v1.8.1 on R-Forge (not yet on CRAN) :

  • Moved data.table setup code from .onAttach to .onLoad so that it is also run when data.table is simply imported from within a package, fixing #1916 related to missing data.table options.

But if you are using 1.8.0 from CRAN, and are Importing (only) rather than Depending then I'd expect you to get an error about missing options rather than a jump in memory consumption.

Upvotes: 3

Related Questions