Tomas
Tomas

Reputation: 59505

How to clean up the function closure (environment) when returning and saving it?

I have a computation like this (please note that this is just very simplified, cut down version, smallest reproducible example!):

computation <- function() # simplified version!
{
    # a lot of big matrices here....
    big_matrix <- matrix(rnorm(2000*2000), nrow = 2000, ncol = 2000)

    exp.value <- 4.5
    prior <- function (x) rep(exp.value, nrow(x))

    # after computation, it returns the model
    list(
        some_info = 5.18,
        prior = prior
    )
}

This function fits and returns a model, which I want to save to disk:

m <- computation()
save(m, file = "tmp.Rdata")
file.info("tmp.Rdata")$size
# [1] 30713946

Unfortunatelly, as you can see, the file is too large, since it contains the whole closure of the function prior(), and this closure contains all the data from the computation() function, including the big_matrix (there are lots of them in my full code).

Now, I tried to fix it by redefining the environment (closure) of the prior function using environment(prior) <- list2env(list(exp.value = exp.value)):

exp.value <- 4.5
environment(m$prior) <- list2env(list(exp.value = exp.value))
save(m, file = "tmp.Rdata")
file.info("tmp.Rdata")$size
# [1] 475

This works as expected! Unfortunatelly, when I put this clean up code into the computation() function (in fact, when I put this code into any function), it stops working! See:

computation <- function() # simplified version!
{
    # a lot of big matrices here....
    big_matrix <- matrix(rnorm(2000*2000), nrow = 2000, ncol = 2000)

    exp.value <- 4.5
    prior <- function (x) rep(exp.value, nrow(x))
    environment(prior) <- list2env(list(exp.value = exp.value)) # this is the update

    # after computation, it returns the model
    list(
        some_info = 5.18,
        prior = prior
    )
}
m <- computation()
save(m, file = "tmp.Rdata")
file.info("tmp.Rdata")$size
# [1] 30713151

the file is huge again, the closure was not clean up correctly.

  1. I don't understandWhat is going on here? Why is the clean-up code working when run outside of any function and stops working when in function?
  2. How to make it work inside a function?

Upvotes: 6

Views: 215

Answers (3)

thc
thc

Reputation: 9705

Since you aren't using functional programming, this is a good use case for R6 classes:

library(R6)
Computation <- R6Class("Computation", list(
  exp.value = NULL,
  prior = function (x) rep(self$exp.value, nrow(x)),
  initialize = function(exp.value) {
    big_matrix <- matrix(rnorm(2000*2000), nrow = 2000, ncol = 2000)
    self$exp.value <- exp.value
  }
))

m <- Computation$new(4.5)
saveRDS(m, file = "/tmp/test.rds")
file.info("/tmp/test.rds")$size
[1] 2585

m$prior(data.frame(1:10))
[1] 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5 4.5

Upvotes: 0

moodymudskipper
moodymudskipper

Reputation: 47320

Rather than choosing what to remove as MrFlick proposed, you might want to choose what to keep, that would reduce the chances of mistakes in more complex code and might be less verbose.

I like to state this kind of action at the top of my function's body using on.exit() so it's obvious when reading the code that the closure's environment is relevant, and it won't interfere with the rest of the code.

computation <- function() # simplified version!
{
  on.exit(rm(list=setdiff(ls(), "exp.value")))

  # a lot of big matrices here....
  big_matrix <- matrix(rnorm(2000*2000), nrow = 2000, ncol = 2000)

  exp.value <- 4.5
  prior <- function (x) rep(exp.value, nrow(x))

  # after computation, it returns the model
  list(
    some_info = 5.18,
    prior = prior
  )
}
m <- computation()
file <- tempfile(fileext = ".Rdata")
save(m, file = file)
file.info(file)$size
#> [1] 2830
m$prior(data.frame(a=1:2))
#> [1] 4.5 4.5

Upvotes: 0

MrFlick
MrFlick

Reputation: 206253

One way to fix the problem is to remove the large variable from the environment before returning.

computation <- function() 
{
    big_matrix <- matrix(rnorm(2000*2000), nrow = 2000, ncol = 2000)

    exp.value <- 4.5
    prior <- function (x) rep(exp.value, nrow(x))

    rm(big_matrix) ## remove variable

    list(
        some_info = 5.18,
        prior = prior
    )
}

The problem with your list2env method is that by default it points to the current environment as the parent environment for the new environment so you are capturing everything inside the function anyway. You can instead specify the global environment as the base environment

computation <- function() 
{
  big_matrix <- matrix(rnorm(2000*2000), nrow = 2000, ncol = 2000)

  exp.value <- 4.5
  prior <- function (x) rep(exp.value, nrow(x))
                                                              # explicit parent
  environment(prior) <- list2env(list(exp.value = exp.value), parent=globalenv()) 

  list(
    some_info = 5.18,
    prior = prior
  )
}

(If you specify emptyenv() then you won't be able to find built in functions like rep())

Upvotes: 5

Related Questions