Reputation: 1954
I have been using rmarkdown
/knitr
's knit to html
capability to generate html code for some blogs. I've found it extremely helpful and convenient, but have been running into some problems lately with file size.
When I knit a script that has graphics that use shapefiles or ggmap
images, the html file gets too big for the blog host to make sense of it (I've tried with both blogger and wordpress). I believe this has to do with the relatively large data.frames/files that are the shapefiles/ggmap
being put into html form. Is there anything I can do to get a smaller html file that can be parsed by a blog host?
For reference, the html output from an rmarkdown script with one graphic using a ggmap
layer, a layer of shapefiles and some data is 1.90MB, which is too big for blogger or wordpress to handle in html input. Thanks for any ideas.
Upvotes: 5
Views: 3933
Reputation: 4624
Below are 3 different options to help you reduce the file size of HTML files with encoded images.
You can run this Python script on an existing HTML file. The script will:
Usage:
python optimize_html.py infile.html
It writes output to infile-optimized.html
.
knitr 1.15 includes a hook called hook_optipng
that will run the optipng program on generated PNG files to reduce file size.
Here is a .Rmd
example (taken from: knitr-examples/035-optipng.Rmd):
# 035-optipng.Rmd
This demo shows you how to optimize PNG images with `optipng`.
```{r setup}
library(knitr)
knit_hooks$set(optipng = hook_optipng)
```
Now we set the chunk option `optipng` to a non-`NULL` value,
e.g. `optipng=''`, to activate the hook. This string is passed to
`optipng`, so you can use `optipng='-o7'` to optimize more heavily.
```{r use-optipng, optipng=''}
library(methods)
library(ggplot2)
set.seed(123)
qplot(rnorm(1e3), rnorm(1e3))
```
Writing your own hook is also quite easy, so I wrote a hook that calls the pngquant program. I find that pngquant
runs faster, and the output files are smaller and look better.
Here is a .R
example that defines and uses hook_pngquant
(taken from this gist).
#' ---
#' title: "pngquant demo"
#' author: "Kamil Slowikowski"
#' date: "`r Sys.Date()`"
#' output:
#' html_document:
#' self_contained: true
#' ---
#+ setup, include=FALSE
library(knitr)
# Functions taken from knitr/R/utils.R
all_figs = function(options, ext = options$fig.ext, num = options$fig.num) {
fig_path(ext, options, number = seq_len(num))
}
in_dir = function(dir, expr) {
if (!is.null(dir)) {
owd = setwd(dir); on.exit(setwd(owd))
}
wd1 = getwd()
res = expr
wd2 = getwd()
if (wd1 != wd2) warning(
'You changed the working directory to ', wd2, ' (probably via setwd()). ',
'It will be restored to ', wd1, '. See the Note section in ?knitr::knit'
)
res
}
is_windows = function() .Platform$OS.type == 'windows'
in_base_dir = function(expr) {
d = opts_knit$get('base.dir')
if (is.character(d) && !file_test('-d', d)) dir.create(d, recursive = TRUE)
in_dir(d, expr)
}
# Here is the code you can modify to use any image optimizer.
hook_pngquant <- function(before, options, envir) {
if (before)
return()
ext = tolower(options$fig.ext)
if (ext != "png") {
warning("this hook only works with PNG")
return()
}
if (!nzchar(Sys.which("pngquant"))) {
warning("cannot find pngquant; please install and put it in PATH")
return()
}
paths = all_figs(options, ext)
in_base_dir(lapply(paths, function(x) {
message("optimizing ", x)
cmd = paste(
"pngquant",
if (is.character(options$pngquant)) options$pngquant,
shQuote(x)
)
message(cmd)
(if (is_windows())
shell
else system)(cmd)
x_opt = sub("\\.png$", "-fs8.png", x)
file.rename(x_opt, x)
}))
return()
}
# Enable this hook in this R script.
knit_hooks$set(
pngquant = hook_pngquant
)
#' Here we set the chunk option `pngquant='--speed=1 --quality=0-50'`,
#' which activates the hook.
#+ use-pngquant, pngquant='--speed=1 --quality=0-50'
library(methods)
library(ggplot2)
set.seed(123)
qplot(rnorm(1e3), rnorm(1e3))
I prefer to write my reports in R scripts (.R
) instead of R markdown documents (.Rmd
). See http://yihui.name/knitr/demo/stitch/ for more information on how to do that.
Upvotes: 4
Reputation: 4970
One thing you could do would be to not use embedded image and other resources. To achieve this, you can set the self_contained
option in the YAML header for your document to false
, e.g.:
---
output:
html_document:
self_contained: false
---
More info here: http://rmarkdown.rstudio.com/html_document_format.html
Upvotes: 2