moman822
moman822

Reputation: 1954

knitr HTML output too large

I have been using rmarkdown/knitr's knit to html capability to generate html code for some blogs. I've found it extremely helpful and convenient, but have been running into some problems lately with file size.

When I knit a script that has graphics that use shapefiles or ggmap images, the html file gets too big for the blog host to make sense of it (I've tried with both blogger and wordpress). I believe this has to do with the relatively large data.frames/files that are the shapefiles/ggmap being put into html form. Is there anything I can do to get a smaller html file that can be parsed by a blog host?

For reference, the html output from an rmarkdown script with one graphic using a ggmap layer, a layer of shapefiles and some data is 1.90MB, which is too big for blogger or wordpress to handle in html input. Thanks for any ideas.

Upvotes: 5

Views: 3933

Answers (2)

Kamil Slowikowski
Kamil Slowikowski

Reputation: 4624

Below are 3 different options to help you reduce the file size of HTML files with encoded images.


1. Optimize an existing HTML file

You can run this Python script on an existing HTML file. The script will:

  • decode the base64 encoded images
  • run pngquant to optimize the images
  • re-encode the optimized images as base64

Usage:

python optimize_html.py infile.html

It writes output to infile-optimized.html.


2. Use the built-in knitr hook for optimizing PNG images

knitr 1.15 includes a hook called hook_optipng that will run the optipng program on generated PNG files to reduce file size.

Here is a .Rmd example (taken from: knitr-examples/035-optipng.Rmd):

# 035-optipng.Rmd

This demo shows you how to optimize PNG images with `optipng`.

```{r setup}
library(knitr)
knit_hooks$set(optipng = hook_optipng)
```

Now we set the chunk option `optipng` to a non-`NULL` value,
e.g. `optipng=''`, to activate the hook. This string is passed to
`optipng`, so you can use `optipng='-o7'` to optimize more heavily.

```{r use-optipng, optipng=''}
library(methods)
library(ggplot2)
set.seed(123)
qplot(rnorm(1e3), rnorm(1e3))
```

3. Write your own knitr hook for any image optimizer

Writing your own hook is also quite easy, so I wrote a hook that calls the pngquant program. I find that pngquant runs faster, and the output files are smaller and look better.

Here is a .R example that defines and uses hook_pngquant (taken from this gist).

#' ---
#' title: "pngquant demo"
#' author: "Kamil Slowikowski"
#' date: "`r Sys.Date()`"
#' output:
#'   html_document:
#'     self_contained: true
#' ---

#+ setup, include=FALSE
library(knitr)

# Functions taken from knitr/R/utils.R
all_figs = function(options, ext = options$fig.ext, num = options$fig.num) {
  fig_path(ext, options, number = seq_len(num))
}
in_dir = function(dir, expr) {
  if (!is.null(dir)) {
    owd = setwd(dir); on.exit(setwd(owd))
  }
  wd1 = getwd()
  res = expr
  wd2 = getwd()
  if (wd1 != wd2) warning(
    'You changed the working directory to ', wd2, ' (probably via setwd()). ',
    'It will be restored to ', wd1, '. See the Note section in ?knitr::knit'
  )
  res
}
is_windows = function() .Platform$OS.type == 'windows'
in_base_dir = function(expr) {
  d = opts_knit$get('base.dir')
  if (is.character(d) && !file_test('-d', d)) dir.create(d, recursive = TRUE)
  in_dir(d, expr)
}

# Here is the code you can modify to use any image optimizer.
hook_pngquant <- function(before, options, envir) {
  if (before)
    return()
  ext = tolower(options$fig.ext)
  if (ext != "png") {
    warning("this hook only works with PNG")
    return()
  }
  if (!nzchar(Sys.which("pngquant"))) {
    warning("cannot find pngquant; please install and put it in PATH")
    return()
  }
  paths = all_figs(options, ext)
  in_base_dir(lapply(paths, function(x) {
    message("optimizing ", x)
    cmd = paste(
      "pngquant",
      if (is.character(options$pngquant)) options$pngquant,
      shQuote(x)
    )
    message(cmd)
    (if (is_windows())
      shell
      else system)(cmd)
    x_opt = sub("\\.png$", "-fs8.png", x)
    file.rename(x_opt, x)
  }))
  return()
}

# Enable this hook in this R script.
knit_hooks$set(
  pngquant = hook_pngquant
)

#' Here we set the chunk option `pngquant='--speed=1 --quality=0-50'`,
#' which activates the hook.

#+ use-pngquant, pngquant='--speed=1 --quality=0-50'
library(methods)
library(ggplot2)
set.seed(123)
qplot(rnorm(1e3), rnorm(1e3))

I prefer to write my reports in R scripts (.R) instead of R markdown documents (.Rmd). See http://yihui.name/knitr/demo/stitch/ for more information on how to do that.

Upvotes: 4

Keith Hughitt
Keith Hughitt

Reputation: 4970

One thing you could do would be to not use embedded image and other resources. To achieve this, you can set the self_contained option in the YAML header for your document to false, e.g.:

---
output:
  html_document:
    self_contained: false
---

More info here: http://rmarkdown.rstudio.com/html_document_format.html

Upvotes: 2

Related Questions