Reputation: 6891
Approach 1
f1 <- function(x)
{
# Do calculation xyz ....
f2 <- function(y)
{
# Do stuff...
return(some_object)
}
return(f2(x))
}
Approach 2
f2 <- function(y)
{
# Do stuff...
return(some_object)
}
f3 <- function(x)
{
# Do calculation xyz ....
return(f2(x))
}
Assume f1
and f3
both do the same calculations and give the same result.
Are there any significant advantages in using approach 1, calling f1()
, vs approach 2, calling f3()
?
Is a certain approach more favourable when:
large data is being passed in and/or out of f2
?
Speed is a big issue. E.g. f1
or f3
are called repeatedly in simulations.
(Approach 1 seems common in packages, defining inside another)
One advantage of using the approach f1
is that f2
won't exist outside f1
once f1
has finished being called (and f2
is only called in f1
or f3
).
Upvotes: 12
Views: 995
Reputation: 6891
An example of what is mentioned in existing answers is probably what I now think of as being the most useful benefit of defining a function in the environment of another function. In simple terms: You can define functions without specifying all the parameters used inside it, provided those parameters are defined somewhere in the environment in which the function is defined. A nice reference for function environments is of course: https://adv-r.hadley.nz/environments.html
This approach can be handy for breaking up blocks of code in a function, where multiple variables might be required and referred to within the function body, into a bunch of sub functions in the function's environment, allowing for cleaner representation of the code, without having to write out a potentially long parameter list.
A simple dummy example below highlights the point
f1 <- function(x)
{
f2 <- function(y)
{
# possibly long block of code relevant to the meaning of what `f2` represents
y + a + b + d
}
# might be 10+ variables in special cases
a <- 10
b <- 5
d <- 1
f2(x)
}
#test:
> f1(100)
[1] 116
You can't use this approach if you define the functions with separate parent environments:
f3 <- function(x)
{
a <- 10
b <- 5
d <- 1
f2a(x)
}
f2a <- function(y)
{
y + a + b + d
}
> f3(100)
Error in f2a(x) : object 'a' not found
Upvotes: 0
Reputation: 52647
Benefits of defining f2
inside f1
:
f2
only visible within f1
, useful if f2
is only meant for use within f1
, though within package namespaces this is debatable since you just wouldn't export f2
if you defined it outsidef2
has access to variables within f1
, which could be considered a good or a bad thing:
<<-
to implement stuff like memoization, etc. Disadvantages:
f2
needs to be redefined every time you call f1
, which adds some overhead (not very much overhead, but definitely there)Data size should not matter since R won't copy the data unless it is being modified under either scenario. As noted in disadvantages, defining f2
outside of f1
should be a little faster, especially if you are repeating an otherwise relatively low overhead operation many times. Here is an example:
> fun1 <- function(x) {
+ fun2 <- function(x) x
+ fun2(x)
+ }
> fun2a <- function(x) x
> fun3 <- function(x) fun2a(x)
>
> library(microbenchmark)
> microbenchmark(
+ fun1(TRUE), fun3(TRUE)
+ )
Unit: nanoseconds
expr min lq median uq max neval
fun1(TRUE) 656 674.5 728.5 859.5 17394 100
fun3(TRUE) 406 434.5 480.5 563.5 1855 100
In this case we save 250ns (edit: the difference is actually 200ns; believe it or not the extra set of {}
that fun1
has costs another 50ns). Not much, but can add up if the interior function is more complex or you repeat the function many many times.
Upvotes: 11
Reputation: 60492
You would typically use approach 2. Some exceptions are
Function closures:
f = function() {
counter = 1
g = function() {
counter <<- counter + 1
return(counter)
}
}
counter = f()
counter()
counter()
Function closure enable us to remember the state.
Sometimes it's handy to only define functions as they are only used in one place. For example, when using optim
, we often tweak an existing function. For example,
pdf = function(x, mu) dnorm(x, mu, log=TRUE)
f = function(d, lower, initial=0) {
ll = function(mu) {
if(mu < lower) return(-Inf)
else -sum(pdf(d, mu))
}
optim(initial, ll)
}
f(d, 1.5)
The ll
function uses the data set d
and a lower bound. This is both convenient since this may be the only time we use/need the ll
function.
Upvotes: 5