wm3
wm3

Reputation: 113

How can I create a 1000-page pdf faster?

I need to plot more than 1000 pages to a PDF file using ggplot2 in R. Any faster way to do besides the following code:

library(ggplot2)
data(diamonds)
pdf("name.pdf", width = 6, height = 6)
for(i in 1:1000) {
  p1 <- ggplot(diamonds, aes(x = carat,  y = price)) +
        geom_point()
  print(p1)
}
dev.off()

My actual case like this;

(1) need to read a file, and create a data.frame according to the value for each line of it.

(2) make a plot of each line of that file to pdf.

fa <- read.table(file)
pdf(name.pdf, width = 6, height = 4)
for(i in 1:nrow(fa)) {
  new.data <- function(i)
  p1 <- ggplot(new.data,...) + ...
  print(p1)
}
dev.off()

Upvotes: 3

Views: 492

Answers (2)

wm3
wm3

Reputation: 113

Thanks to @Carl Witthoft's suggestion, I will use parallel + foreach for my task. Here are examples, I am trying to make simpler plots instead.

Here are my points: Throw the data computing to parallel and store the plots to a list (maybe very huge), at last, print all the figures to a PDF file.

library("ggplot2")
library("lattice")
data(diamonds)
gg_plot <- function() {
  cat(".")
  for(i in 1:5) {
    fig <- ggplot(diamonds, aes(x = carat, y = price)) + geom_point()
    print(fig)
  }
}

para_plot <- function() {
  cat("+")
  library(foreach)
  library(doParallel)
  library(ggplot2)    
  cl <- makeCluster(2)
  registerDoParallel(cl, cores = 2)
  AllFigs <- list()
  cTime <- system.time(
    AllFigs <- foreach(i = 1:5, .packages = c("ggplot2")) %dopar% {
      fig <- ggplot(mtcars, aes(x = mpg, y = disp)) + geom_point()
      #fig <- ggplot(diamonds, aes(x = carat, y = price)) + geom_point()
      fig
    }
  )
  stopCluster(cl)    
  print(AllFigs)    
}

wrap <- function(f,npages=20,fn="name.pdf") {
  pdf(fn, width = 6, height = 6) 
  for(i in 1:npages) {
    f()
  }
  dev.off()
  unlink(fn)
}

library("rbenchmark")
benchmark(wrap(gg_plot), wrap(para_plot), replications=10)

Yes, I think parallel is twice faster than normal. But, I fell it still needs improvement.

test replications elapsed relative user.self sys.self user.child sys.child
1   wrap(gg_plot)           10 620.109    1.937   611.018    5.125      0.000     0.000
2 wrap(para_plot)           10 320.081    1.000   138.696    5.475      0.349     1.931

Upvotes: 2

Ben Bolker
Ben Bolker

Reputation: 226692

As commented above, speed is one of ggplot2's weaknesses. It takes some work but you can often replicate the appearance of a ggplot in one of the other standard plotting packages (base or lattice); e.g. this series of blog posts goes the other way (from lattice to ggplot), but the examples should be helpful. (@G.Grothendieck comments below that library(latticeExtra); xyplot(y ~ x, diamonds, par.settings = ggplot2like(), lattice.options = ggplot2like.opts()) will generate ggplot-like plots.)

If you were really desperate I suppose you could use parallel::parApply to generate a sensible number of separate PDFs and then use external tools such as pdftk to stitch them together ...

Set up machinery to generate (approximately) the same plots in all three systems

 library("ggplot2")
 library("lattice")
 data(diamonds)
 gg_plot <- function() {
    cat(".")
    print(ggplot(diamonds, aes(x = carat,  y = price)) +
    geom_point())
 }
 base_plot <- function() {
    cat("+")
    plot(y~x,data=diamonds)
 }
 lattice_plot <- function() {
    cat("/")
    print(xyplot(y~x,data=diamonds))
 }
 wrap <- function(f,npages=20,fn="name.pdf") {
    pdf(fn, width = 6, height = 6) 
    for(i in 1:npages) {
           f()
    }
    dev.off()
    unlink(fn)
 }

 library("rbenchmark")
 benchmark(wrap(gg_plot),wrap(base_plot),wrap(lattice_plot),
           replications=10)

OK, this was much slower than I expected (I cut it back to 20 pages per PDF and 10 replications). (I initially thought lattice won by a lot, but that's because I forgot to print() the results ...)

lattice and base are both about twice as fast as ggplot ...

                test replications elapsed relative user.self sys.self
2    wrap(base_plot)           10  75.693    1.249    74.053    1.596
1      wrap(gg_plot)           10 120.397    1.987   117.507    2.832
3 wrap(lattice_plot)           10  60.590    1.000    58.580    1.976

Upvotes: 7

Related Questions