Reputation: 4841
Suppose that I use knitr
, I have a chunk which takes a while to run, I want this chunk to update if a file changes but not if I e.g., change fig.path
. The later suggest that I should change the cache
chunk option to 1 but then I cannot use a check sum as suggested here.
Here is an example of a markdown file
---
title: "Example"
author: "Benjamin Christoffersen"
date: "September 2, 2018"
output: html_document
---
```{r setup, include=FALSE}
data_file <- "~/data.RDS"
knitr::opts_chunk$set(echo = TRUE, cache.extra = tools::md5sum(data_file))
```
```{r load_data}
dat <- readRDS(data_file)
```
```{r large_computation, cache = 1}
Sys.sleep(10)
Sys.time() # just to that result do not change
```
```{r make_some_plot}
hist(dat)
```
Running set.seed(1): saveRDS(rnorm(100), "~/data.RDS")
and knitting yields
Then running set.seed(2): saveRDS(rnorm(100), "~/data.RDS")
and knitting yields
showing that large_computation
is not updated as is should not since cache.extra
is not in the knitr:::cache1.opts
vector. Of course, I can save the md5sum
result, check the previous stored file and use cache.rebuild
or do something similar in the large_computation
chunk but it would be nice with a knitr
solution. I often find that I change some chunk options (e.g., dpi
, fig.width
, and fig.height
) so using cache = TRUE
will not work. I guess one could modify the package to be able to add options to knitr:::cache1.opts
.
Upvotes: 5
Views: 1387
Reputation: 1
I found another solution to the ignorance of cache.extra when cache=1 or 2. Please insert the following hook code to the setup section, which inserts extra comment to a code section to invalidate a cache when the cache.extra is changed.
knitr::opts_hooks$set(cache.extra = function(options){
# invalidate cache
options$code <- c(sprintf("# cache.extra: %s", options$cache.extra), options$code)
options
})
Upvotes: 0
Reputation: 14957
If I understand the question correctly, the problem is that cache.extra
is not taken into account if cache
is set to 1
. In fact, this is by design.
The desired behavior is to invalidate the cache of all chunks (including chunks with cache = 1
) if an external file (or more general: some value provided to cache.extra
) changes.
As mentioned in the question, one way to achieve this is using the chunk option cache.rebuild
but instead of manually keeping track of changes in the external file, I'd take advantage if knitr's built-in caching capabilies:
```{r cachecontrol, cache = TRUE, cache.extra = tools::md5sum(data_file)}
knitr::opts_chunk$set(cache.rebuild = TRUE)
```
Adding this as an early chunk, the cache of all subsequent chunks is invalidated if data_file
changes. The idea is to cache the chunk that controls caching of subsequent chunks – but only if the external file is unchanged.
Of course, this only works if no global chunk options are changed before the cachecontrol
chunk is evaluated.
Full example from the question:
Run set.seed(1); saveRDS(rnorm(100), "data.RDS")
with different seeds to generate different external files, then knit:
---
title: "Invalidate all chunks condidional on external file (even if cache=1)"
output: html_document
---
```{r}
data_file <- "data.RDS"
```
```{r cachecontrol, include = FALSE, cache = TRUE, cache.extra = tools::md5sum(data_file)}
# do NOT change global chunk options before this chunk
knitr::opts_chunk$set(cache.rebuild = TRUE)
```
```{r setup, include = FALSE}
knitr::opts_chunk$set(echo = TRUE, fig.width = 8)
```
```{r load_data}
dat <- readRDS(data_file)
```
```{r large_computation, cache = 1}
Sys.sleep(10)
Sys.time() # just to show that result do not change unless external file changes
```
```{r make_some_plot}
hist(dat)
```
Upvotes: 2