BioBroo
BioBroo

Reputation: 683

Is calling environments in R computationally expensive?

I have a master user-defined function with many inputs. The master function calls a user-defined function, which calls another, which in turn calls another, and so on, each time using a smaller subset of the inputs. I came up with two ways of passing the inputs to lower level functions:

  1. manually, and
  2. having each lower level function getting the inputs from the master function.

Solution 1) is typing intensive, and I suspect not what a more experienced programmer would do. Solution 2) seems neater, but it takes much longer to run. So I have two questions: A) Why does solution 2) take more time? B) Is there an even better solution than either of these that reduces manual work by the programmer and is computationally efficient? This kind of programming scenario has come up for me in my biology research, as well as in coding up statistical methods, so I assume this a common problem that others have solved.

I have included a simple example (adding 5 numbers) of the two solutions below, along with timing.

# Solution 1)
f0 <- function(a0,a1,a2,a3,a4){
  val <- a0 + f1(a1=a1,a2=a2,a3=a3,a4=a4)
  return(val)
}

f1 <- function(a1,a2,a3,a4){
  val <- a1 + f2(a2=a2,a3=a3,a4=a4)
  return(val)
}

f2 <- function(a2,a3,a4){
  val <- a2 + f3(a3=a3,a4=a4)
  return(val)
}

f3 <- function(a3,a4){
  val <- a3 + f4(a4=a4)
  return(val)
}

f4 <- function(a4){
  val <- a4
  return(val)
}

# Solution 2)

g0 <- function(a0,a1,a2,a3,a4){
  vars <- list('a0','a1','a2','a3','a4')
  env <<- environment()
  val <- a0 + g1()
  return(val)
}

g1 <- function(){
  for (i in get('vars',env)){assign(i,get(i,env),environment())}
  val <- a1 + g2()
  return(val)
}

g2 <- function(){
  for (i in get('vars',env)){assign(i,get(i,env),environment())}
  val <- a2 + g3()
  return(val)
}

g3 <- function(){
  for (i in get('vars',env)){assign(i,get(i,env),environment())}
  val <- a3 + g4()
  return(val)
}

g4 <- function(){
  for (i in get('vars',env)){assign(i,get(i,env),environment())}
  val <- a4
  return(val)
}

# Timing
t0 <- Sys.time()
replicate(1e4, f0(1,2,3,4,5))
t1 <- Sys.time()

tt0 <- Sys.time()
replicate(1e4, g0(1,2,3,4,5))
tt1 <- Sys.time()

# Time: Solution 1)
> t1-t0
Time difference of 0.2921922 secs

# Time: Solution 2)
> tt1-tt0
Time difference of 0.953675 secs

Upvotes: 0

Views: 156

Answers (3)

Roland
Roland

Reputation: 132854

Use ... to pass parameters to the subsequent functions:

f0 <- function(a0, ...){
  val <- a0 + f1(...)
  return(val)
}

f1 <- function(a1, ...){
  val <- a1 + f2(...)
  return(val)
}

f2 <- function(a2, ...){
  val <- a2 + f3(...)
  return(val)
}

f3 <- function(a3, ...){
  val <- a3 + f4(...)
  return(val)
}

f4 <- function(a4){
  val <- a4
  return(val)
}

f0(1,2,3,4,5)
#[1] 15

Regarding A): Each function call costs time. And I think assign in particular is not very fast.

Upvotes: 2

Gregor Thomas
Gregor Thomas

Reputation: 145965

You could pass around a named list(), or even create your own class based on a list. This is more-or-less how most models work in R: an lm object is a big list and there are lots of functions (predict, summary, coef, AIC, plot, etc.) that use whatever parts of the object that they need.

# Solution 4)
h0 <- function(arg_list){
 arg_list$a0 + h1(arg_list)
}

h1 <- function(arg_list){
  arg_list$a1 + h2(arg_list)
}

h2 <- function(arg_list){
  arg_list$a2 + h3(arg_list)
}

h3 <- function(arg_list) {
  arg_list$a3 + h4(arg_list)
}

h4 <- function(arg_list) {
  arg_list$a4
}


h0(list(a0 = 1, a1 = 2, a2 = 3, a3 = 4, a4 = 5))
# [1] 15

This has the advantage that you don't have to worry too much about exact dependencies. If h2 calls h3 and you edit h3 use another piece of the list, you don't have to also edit h2 to pass through the right argument since you're passing the whole object around.

Imagine how annoying it would be if you had to call summary.lm with exactly the pieces of a model that are used by summary and nothing else, instead of summary(my_model) you'd have summary(rank = my_model$rank, resid = my_model$residuals, df_resid = my_model$df.residuals, w = my_mod$weights, ...) and on and on for half or more of the elements of the model!

Upvotes: 2

Richie Cotton
Richie Cotton

Reputation: 121127

In general, solution 1 is probably easier to maintain, since each function is clear about what it is supposed to do. The time taken to understand your code in the future will be less.

A generalised "best" solution is difficult; in this case it is trivially just a1 + a2 + a3 + a4 + a5, though what the real functions are and how they interact will greatly affect your solution.

As to why solution 2 takes so long, it isn't just looking up variables in an environment, you are doing lots of getting and assigning.

I also don't think the function is doing what you think it is doing, since vars isn't stored in the environment env.

You could consider storing the variables in the global environment like this:

a1 <- 1
a2 <- 2
a3 <- 4
a4 <- 8


h1 <- function(){
  a1 + h2()
}

h2 <- function(){
  a2 + h3()
}

h3 <- function(){
  a3 + h4()
}

h4 <- function(){
  a4
}

h1()

Upvotes: 0

Related Questions