Reputation: 691
Is it possible to delete a data.table
object by reference? How could a function rm_tbl()
be implemented that takes a data.table
object and assigns NULL
to any name pointing to this object in any environment (that is a descendant of the globalenv)?
Examples, which for obvious reasons do not work, but perhaps convey the idea of what I am trying to achieve:
rm_tbl_1 <- function(tbl) {
rm(tbl)
invisible(NULL)
}
rm_tbl_2 <- function(tbl) {
tbl <<- NULL
invisible(NULL)
}
The following comes close, but is a bit of a hack (also it does not result in NULL
but a zero-row data.table
)
rm_tbl_3 <- function(tbl) {
tbl[, colnames(tbl) := rep(list(NULL), ncol(tbl))]
invisible(NULL)
}
For completeness:
dt <- data.table(a = 1:3, b = 2:4)
rm_tbl_1(dt)
dt
rm_tbl_2(dt)
dt
rm_tbl_3(dt)
dt
As per upvoted suggestion by @Gregor, some further explanation: The problem I'm facing is that I have a large data.table
. Now somewhere in a function, I do something to this object, e.g. call data.table::split()
and I no longer have any need for the original data.table
. Additionally, In order to do further transformations on my data, I need the memory back of the original data.table
. How do I do that?
An example:
fun_a <- function() {
dt <- data.table(a = 1:2, b = 2:3)
fun_b(dt)
}
fun_b <- function(tbl) {
temp <- split(tbl, by = "b")
rm_dt(tbl)
do_stuff_with_dt(temp)
}
fun_a()
Does this clear things up? I'm sorry for not being clearer to begin with.
Upvotes: 2
Views: 724
Reputation: 10543
This should do the trick:
rm_dt <- function(name) {
# Get data.table of all data.tables in global environment
tbls <- tables(env=.GlobalEnv, silent=TRUE)
# Look up the externalptr address for each data.table
tbls <- tbls[, .(addr=eval(parse(text=sprintf("data.table::address(%s)", NAME)))),
by = NAME]
# Find all data.tables that have the same externalptr address as the one requested for deletion
to_rm <- tbls[addr == tbls[NAME == name, addr], NAME]
# Delete them
rm(list=to_rm, pos=".GlobalEnv")
}
dt <- data.table(a=1)
dt2 <- dt
dt3 <- data.table(a=1)
rm_dt("dt") # should delete dt and dt2, but not dt3
Note this will only delete all references in the global environment, if you create a reference in another environment this won't be deleted:
dt <- data.table(a=1)
dt2 <- dt
e <- new.env()
e$dt3 <- dt
# dt and dt2 will be removed, but e$dt3 will still exist
rm_dt("dt")
Upvotes: 2
Reputation: 9809
This should work:
rm_tbl_4 <- function(tbl) {
tbl = deparse(substitute(tbl))
rm(list = tbl, pos = ".GlobalEnv")
}
dt <- data.table(a = 1:3, b = 2:4)
rm_tbl_4(dt)
dt
You could also include the environment as function variable, so you can decide where to delete it from.
rm_tbl_4 <- function(tbl, env) {
tbl = deparse(substitute(tbl))
rm(list = tbl, pos = env)
}
dt <- data.table(a = 1:3, b = 2:4)
rm_tbl_4(dt, env=".GlobalEnv")
dt
Upvotes: 0