Konrad
Konrad

Reputation: 18585

Correct approach to caching a computationally expensive object generated within function

Background

In the following scenario, I'm looking at the following workflow:

  1. Top-level runner function, in the example below running_function, calls a number of smaller functions.
  2. Some of those functions are computationally expansive, and will be called on the same set of argument repeatedly as runner functions is called reputedly by top-level script

Example

Without attempt to cache objects the situation may be summarised as follow:

Work functions

painful_function <- function(n = 100) {
  matrix(1:n * n, nrow = n)
}

running_function <-
  function(stat_to_do = c("min", "max", "mean", "sum"),
           painful_size = 1e4) {
    stat_to_do <- match.arg(stat_to_do)

    M_pain <- painful_function(n = painful_size)
    do.call(stat_to_do, list(M_pain))

  }

Actual job

# Object M_pain is created inside running_function
running_function(stat_to_do = "min", painful_size = 100)
# I would like to re-use the M_pain object from the previous function
running_function(stat_to_do = "max", painful_size = 100)
# Re-using M_pain again...
running_function(stat_to_do = "mean", painful_size = 100)
# And again ...
running_function(stat_to_do = "sum", painful_size = 100)

Desired outcome

The idea is not to call the painful_function more than once as the object it generates is identical in each of the scenarios. Therunning_function should be evaluated with the provided arguments.

Approach

I was thinking of making use of mustashe package:

library("mustashe")
running_function_mstash <-
  function(stat_to_do = c("min", "max", "mean", "sum"),
           painful_size = 1e4) {
    stat_to_do <- match.arg(stat_to_do)

    stash(var = "M_pain",
          code = {
            painful_function(n = painful_size)
          },
          depends_on = "painful_size")
    do.call(stat_to_do, list(M_pain))
}

This returns the following error:

running_function_mstash(stat_to_do = "min", painful_size = 1e6)

Error in make_hash(depends_on, .TargetEnv) : Some dependencies are missing from the environment.

Questions

I'm interested in learning the following:

  1. How to make this work, i.e. the running_function will only execute painful_function if one of the arguments passed down changes, if not the resulting object is stored from a file
  2. What are ay better approaches to using this. A trivial, "brute force" one would be to create a temporary RDS with a funky name and only execute painful_function if the file doesn't exist. This lame approach and has obvious drawbacks. I would like to find a robust solution that covers similar, workable scenario.

Upvotes: 2

Views: 43

Answers (1)

akrun
akrun

Reputation: 887281

It may be that the object is not getting detected. According to the example in ?stash, we need to use <<-

running_function_mstash <-
  function(stat_to_do = c("min", "max", "mean", "sum"),
           painful_size = 1e4) {
    stat_to_do <- match.arg(stat_to_do)
    painful_size <<- painful_size
    stash(var = "M_pain",
          code = {
            painful_function(n = painful_size)
          },
          depends_on = "painful_size")
    do.call(stat_to_do, list(M_pain))
}

running_function_mstash(stat_to_do = "min", painful_size = 1e6)
#Stashing object.
#[1] 1e+06

Upvotes: 1

Related Questions