Is there any trade-off between sourcing and performance?

Question

Does the excessive use of source() to use functions in multiple .R files come with a performance trade-off?

In other words, does the code run significantly faster when it is in a single .R file (that's when it's generally not very beautiful to look at) compared to when there are chunks scattered around in multiple files?

npjc · Accepted Answer

I think you should really ask what kinds of "gains" you stand form such a thing vs. clarity of the code but just for completeness...

Let's compare:

Here we source 10 fxns either in one source() to long.r or with 3 to short1-3.r.

in long.r:

# my long source file
fun1 <- function(x) x
fun2 <- function(x) x
fun3 <- function(x) x
fun4 <- function(x) x
fun5 <- function(x) x
fun6 <- function(x) x
fun7 <- function(x) x
fun8 <- function(x) x
fun9 <- function(x) x
fun10 <- function(x) x

in short1.r

# my shrt source file
fun1 <- function(x) x
fun2 <- function(x) x
fun3 <- function(x) x

in short2.r

fun4 <- function(x) x
fun5 <- function(x) x
fun6 <- function(x) x

in short3.r

fun7 <- function(x) x
fun8 <- function(x) x
fun9 <- function(x) x
fun10 <- function(x) x

Benchmarking:

require(microbenchmark)

src_long <- function(){
source("long.r")
}

src_shorts <- function(){
    source("short1.r")
    source("short2.r")
    source("short3.r")
}


microbenchmark(src_long(),src_shorts())

on my machine i get:

Unit: microseconds
    expr        min       lq     median      uq      max     neval
  src_long()  691.690  733.271  763.3405  806.3555 3242.216   100
src_shorts() 1354.356 1431.011 1476.2555 1541.9445 3528.760   100

so it takes ~2x as long when you have 3 calls to source() instead of 1. Presumably because of the added evaluations through the else/if statements in source itself. 700 microseconds is not something to ride home about and thus one should defer to whatever gives clearest code.