Reputation: 113
I need to plot more than 1000 pages to a PDF file using ggplot2
in R. Any faster way to do besides the following code:
library(ggplot2)
data(diamonds)
pdf("name.pdf", width = 6, height = 6)
for(i in 1:1000) {
p1 <- ggplot(diamonds, aes(x = carat, y = price)) +
geom_point()
print(p1)
}
dev.off()
My actual case like this;
(1) need to read a file, and create a data.frame
according to the value for each line of it.
(2) make a plot of each line of that file to pdf.
fa <- read.table(file)
pdf(name.pdf, width = 6, height = 4)
for(i in 1:nrow(fa)) {
new.data <- function(i)
p1 <- ggplot(new.data,...) + ...
print(p1)
}
dev.off()
Upvotes: 3
Views: 492
Reputation: 113
Thanks to @Carl Witthoft's suggestion, I will use parallel
+ foreach
for my task. Here are examples, I am trying to make simpler plots instead.
Here are my points: Throw the data computing to parallel
and store the plots
to a list
(maybe very huge), at last, print
all the figures to a PDF file.
library("ggplot2")
library("lattice")
data(diamonds)
gg_plot <- function() {
cat(".")
for(i in 1:5) {
fig <- ggplot(diamonds, aes(x = carat, y = price)) + geom_point()
print(fig)
}
}
para_plot <- function() {
cat("+")
library(foreach)
library(doParallel)
library(ggplot2)
cl <- makeCluster(2)
registerDoParallel(cl, cores = 2)
AllFigs <- list()
cTime <- system.time(
AllFigs <- foreach(i = 1:5, .packages = c("ggplot2")) %dopar% {
fig <- ggplot(mtcars, aes(x = mpg, y = disp)) + geom_point()
#fig <- ggplot(diamonds, aes(x = carat, y = price)) + geom_point()
fig
}
)
stopCluster(cl)
print(AllFigs)
}
wrap <- function(f,npages=20,fn="name.pdf") {
pdf(fn, width = 6, height = 6)
for(i in 1:npages) {
f()
}
dev.off()
unlink(fn)
}
library("rbenchmark")
benchmark(wrap(gg_plot), wrap(para_plot), replications=10)
Yes, I think parallel
is twice faster than normal. But, I fell it still needs improvement.
test replications elapsed relative user.self sys.self user.child sys.child
1 wrap(gg_plot) 10 620.109 1.937 611.018 5.125 0.000 0.000
2 wrap(para_plot) 10 320.081 1.000 138.696 5.475 0.349 1.931
Upvotes: 2
Reputation: 226692
As commented above, speed is one of ggplot2
's weaknesses. It takes some work but you can often replicate the appearance of a ggplot in one of the other standard plotting packages (base or lattice); e.g. this series of blog posts goes the other way (from lattice to ggplot), but the examples should be helpful. (@G.Grothendieck comments below that library(latticeExtra); xyplot(y ~ x, diamonds, par.settings = ggplot2like(), lattice.options = ggplot2like.opts())
will generate ggplot-like plots.)
If you were really desperate I suppose you could use parallel::parApply
to generate a sensible number of separate PDFs and then use external tools such as pdftk
to stitch them together ...
Set up machinery to generate (approximately) the same plots in all three systems
library("ggplot2")
library("lattice")
data(diamonds)
gg_plot <- function() {
cat(".")
print(ggplot(diamonds, aes(x = carat, y = price)) +
geom_point())
}
base_plot <- function() {
cat("+")
plot(y~x,data=diamonds)
}
lattice_plot <- function() {
cat("/")
print(xyplot(y~x,data=diamonds))
}
wrap <- function(f,npages=20,fn="name.pdf") {
pdf(fn, width = 6, height = 6)
for(i in 1:npages) {
f()
}
dev.off()
unlink(fn)
}
library("rbenchmark")
benchmark(wrap(gg_plot),wrap(base_plot),wrap(lattice_plot),
replications=10)
OK, this was much slower than I expected (I cut it back to 20 pages per PDF and 10 replications). (I initially thought lattice
won by a lot, but that's because I forgot to print()
the results ...)
lattice and base are both about twice as fast as ggplot ...
test replications elapsed relative user.self sys.self
2 wrap(base_plot) 10 75.693 1.249 74.053 1.596
1 wrap(gg_plot) 10 120.397 1.987 117.507 2.832
3 wrap(lattice_plot) 10 60.590 1.000 58.580 1.976
Upvotes: 7