Reputation: 59505
I have a computation like this (please note that this is just very simplified, cut down version, smallest reproducible example!):
computation <- function() # simplified version!
{
# a lot of big matrices here....
big_matrix <- matrix(rnorm(2000*2000), nrow = 2000, ncol = 2000)
exp.value <- 4.5
prior <- function (x) rep(exp.value, nrow(x))
# after computation, it returns the model
list(
some_info = 5.18,
prior = prior
)
}
This function fits and returns a model, which I want to save to disk:
m <- computation()
save(m, file = "tmp.Rdata")
file.info("tmp.Rdata")$size
# [1] 30713946
Unfortunatelly, as you can see, the file is too large, since it contains the whole closure of the function prior()
, and this closure contains all the data from the computation()
function, including the big_matrix
(there are lots of them in my full code).
Now, I tried to fix it by redefining the environment (closure) of the prior function using environment(prior) <- list2env(list(exp.value = exp.value))
:
exp.value <- 4.5
environment(m$prior) <- list2env(list(exp.value = exp.value))
save(m, file = "tmp.Rdata")
file.info("tmp.Rdata")$size
# [1] 475
This works as expected! Unfortunatelly, when I put this clean up code into the computation() function (in fact, when I put this code into any function), it stops working! See:
computation <- function() # simplified version!
{
# a lot of big matrices here....
big_matrix <- matrix(rnorm(2000*2000), nrow = 2000, ncol = 2000)
exp.value <- 4.5
prior <- function (x) rep(exp.value, nrow(x))
environment(prior) <- list2env(list(exp.value = exp.value)) # this is the update
# after computation, it returns the model
list(
some_info = 5.18,
prior = prior
)
}
m <- computation()
save(m, file = "tmp.Rdata")
file.info("tmp.Rdata")$size
# [1] 30713151
the file is huge again, the closure was not clean up correctly.
Upvotes: 6
Views: 215
Reputation: 9705
Since you aren't using functional programming, this is a good use case for R6 classes:
library(R6)
Computation <- R6Class("Computation", list(
exp.value = NULL,
prior = function (x) rep(self$exp.value, nrow(x)),
initialize = function(exp.value) {
big_matrix <- matrix(rnorm(2000*2000), nrow = 2000, ncol = 2000)
self$exp.value <- exp.value
}
))
m <- Computation$new(4.5)
saveRDS(m, file = "/tmp/test.rds")
file.info("/tmp/test.rds")$size
[1] 2585
m$prior(data.frame(1:10))
[1] 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5
Upvotes: 0
Reputation: 47320
Rather than choosing what to remove as MrFlick proposed, you might want to choose what to keep, that would reduce the chances of mistakes in more complex code and might be less verbose.
I like to state this kind of action at the top of my function's body using on.exit()
so it's obvious when reading the code that the closure's environment is relevant, and it won't interfere with the rest of the code.
computation <- function() # simplified version!
{
on.exit(rm(list=setdiff(ls(), "exp.value")))
# a lot of big matrices here....
big_matrix <- matrix(rnorm(2000*2000), nrow = 2000, ncol = 2000)
exp.value <- 4.5
prior <- function (x) rep(exp.value, nrow(x))
# after computation, it returns the model
list(
some_info = 5.18,
prior = prior
)
}
m <- computation()
file <- tempfile(fileext = ".Rdata")
save(m, file = file)
file.info(file)$size
#> [1] 2830
m$prior(data.frame(a=1:2))
#> [1] 4.5 4.5
Upvotes: 0
Reputation: 206253
One way to fix the problem is to remove the large variable from the environment before returning.
computation <- function()
{
big_matrix <- matrix(rnorm(2000*2000), nrow = 2000, ncol = 2000)
exp.value <- 4.5
prior <- function (x) rep(exp.value, nrow(x))
rm(big_matrix) ## remove variable
list(
some_info = 5.18,
prior = prior
)
}
The problem with your list2env
method is that by default it points to the current environment as the parent environment for the new environment so you are capturing everything inside the function anyway. You can instead specify the global environment as the base environment
computation <- function()
{
big_matrix <- matrix(rnorm(2000*2000), nrow = 2000, ncol = 2000)
exp.value <- 4.5
prior <- function (x) rep(exp.value, nrow(x))
# explicit parent
environment(prior) <- list2env(list(exp.value = exp.value), parent=globalenv())
list(
some_info = 5.18,
prior = prior
)
}
(If you specify emptyenv()
then you won't be able to find built in functions like rep()
)
Upvotes: 5