nbenn
nbenn

Reputation: 691

Delete an entire data.table by reference

Is it possible to delete a data.table object by reference? How could a function rm_tbl() be implemented that takes a data.table object and assigns NULL to any name pointing to this object in any environment (that is a descendant of the globalenv)?

Examples, which for obvious reasons do not work, but perhaps convey the idea of what I am trying to achieve:

rm_tbl_1 <- function(tbl) {
  rm(tbl)
  invisible(NULL)
}

rm_tbl_2 <- function(tbl) {
  tbl <<- NULL
  invisible(NULL)
}

The following comes close, but is a bit of a hack (also it does not result in NULL but a zero-row data.table)

rm_tbl_3 <- function(tbl) {
  tbl[, colnames(tbl) := rep(list(NULL), ncol(tbl))]
  invisible(NULL)
}

For completeness:

dt <- data.table(a = 1:3, b = 2:4)

rm_tbl_1(dt)
dt
rm_tbl_2(dt)
dt
rm_tbl_3(dt)
dt

Edit

As per upvoted suggestion by @Gregor, some further explanation: The problem I'm facing is that I have a large data.table. Now somewhere in a function, I do something to this object, e.g. call data.table::split() and I no longer have any need for the original data.table. Additionally, In order to do further transformations on my data, I need the memory back of the original data.table. How do I do that?

An example:

fun_a <- function() {
  dt <- data.table(a = 1:2, b = 2:3)
  fun_b(dt)
}

fun_b <- function(tbl) {
  temp <- split(tbl, by = "b")
  rm_dt(tbl)
  do_stuff_with_dt(temp)
}

fun_a()

Does this clear things up? I'm sorry for not being clearer to begin with.

Upvotes: 2

Views: 724

Answers (2)

Scott Ritchie
Scott Ritchie

Reputation: 10543

This should do the trick:

rm_dt <- function(name) {
  # Get data.table of all data.tables in global environment
  tbls <- tables(env=.GlobalEnv, silent=TRUE)
  # Look up the externalptr address for each data.table
  tbls <- tbls[, .(addr=eval(parse(text=sprintf("data.table::address(%s)", NAME)))), 
                 by = NAME]
  # Find all data.tables that have the same externalptr address as the one requested for deletion
  to_rm <- tbls[addr == tbls[NAME == name, addr], NAME]
  # Delete them
  rm(list=to_rm, pos=".GlobalEnv")
}

dt <- data.table(a=1)
dt2 <- dt
dt3 <- data.table(a=1)

rm_dt("dt") # should delete dt and dt2, but not dt3

Note this will only delete all references in the global environment, if you create a reference in another environment this won't be deleted:

dt <- data.table(a=1)
dt2 <- dt
e <- new.env()
e$dt3 <- dt

# dt and dt2 will be removed, but e$dt3 will still exist
rm_dt("dt")

Upvotes: 2

SeGa
SeGa

Reputation: 9809

This should work:

rm_tbl_4 <- function(tbl) {
  tbl = deparse(substitute(tbl))
  rm(list = tbl, pos = ".GlobalEnv")
}

dt <- data.table(a = 1:3, b = 2:4)
rm_tbl_4(dt)
dt

You could also include the environment as function variable, so you can decide where to delete it from.

rm_tbl_4 <- function(tbl, env) {
  tbl = deparse(substitute(tbl))
  rm(list = tbl, pos = env)
}

dt <- data.table(a = 1:3, b = 2:4)
rm_tbl_4(dt, env=".GlobalEnv")
dt

Upvotes: 0

Related Questions