Reputation: 4273
Say, I have an external R script external.R
:
df.rand <- data.frame(rnorm(n = 100), rnorm(n = 100))
Then there's a main.Rmd
:
\documentclass{article}
\begin{document}
<<setup, include = FALSE>>=
library(knitr)
library(ggplot2)
# global chunk options
opts_chunk$set(cache=TRUE, autodep=TRUE, concordance=TRUE, progress=TRUE, cache.extra = tools::md5sum("external.r"))
@
<<source, include=FALSE>>=
source("external.R")
@
<<plot>>=
ggplot(data = df.rand, mapping = aes(x = x, y = y)) + geom_point()
@
\end{document}
It's helpful to have this in an external script, because in reality, it's a bunch of import, data cleaning and simulation tasks that would pollute the main.Rmd
.
Any chunks in main.Rmd
depend on changes in the external script.
To account for this dependency I added the above cache.extra = tools::md5sum("external.r")
.
That seems to work ok.
I'm looking for best practices.
external.R
will trigger a complete cache invalidation, rather than just invalidating only those objects that actually change).There are no side effects (except for some library()
calls, but I can move them to main.Rmd
).
I'm always worried that I'm somehow doing it wrong.
Upvotes: 4
Views: 281
Reputation: 14957
There should be better approaches than the do-it-yourself caching you currently use. To start with, you could split external.R
into chunks:
# ---- CreateRandomDFs----
df.rand1 <- data.frame(rnorm(n = 100), rnorm(n = 100))
df.rand2 <- data.frame(rnorm(n = 100), rnorm(n = 100))
# ---- CreateOtherObjects----
# stuff
In main.Rmd
, add (in a uncached chunk!) read_chunk(path = 'external.R')
. Then execute the chunks:
<<CreateRandomDFs>>=
@
<<CreateOtherObjects>>=
@
If autodep
doesn't work, add dependson
to your chunks. A chunk that only uses df.rand1
and df.rand2
gets dependson = "CreateRandomDFs"
; when other objects are also used, set dependson = c("CreateRandomDFs", "CreateOtherObjects")
.
You may also invalidate a chunk's cache when a certain object changes: cache.whatever = quote(df.rand1)
.
This way, you avoid invalidating the whole cache with any change in external.R
. It is crucial how you split the code in that file into chunks: If you use too many chunks, you will have to list many dependencies; if you use too few chunks, cache gets invalidated more/too often.
Upvotes: 3