Reputation: 73592
I want to run a CPU-intensive plotting function on a bunch of countries. Therefore I attempted to parallelize my code, however, I don't get any output so far. I use x11()
for tests and pdf()
for the final result.
The normal code looks like this,
x11(width=7, height=7) # comment out for pdf output
# pdf("plot.pdf") # un-comment for pdf output
op <- par(mfrow=c(3, 3))
sapply(unique(dat$country), function(x)
with(dat[dat$country == x, ],
plot(year, value, type="l", # complicated plotting function
main=x, #
xlim=c(2014, 2019), ylim=0:1, col=2)) #
)
par(op)
# dev.off() # un-comment for pdf output
with this output:
Here my attempt to parallelize.
library(parallel)
x11(width=7, height=7) # comment out for pdf output
# pdf("plot.pdf") # un-comment for pdf output
op <- par(mfrow=c(3, 3))
cl <- makeCluster(detectCores() - 1)
clusterExport(cl, c("dat"), envir=environment())
parSapply(cl, 1:length(unique(dat$country)), function(x, ...) {
with(dat[dat$country == x, ],
plot(year, value, type="l", # complicated plotting function
main=unique(dat$country)[x], #
xlim=c(2014, 2019), ylim=0:1, col=2)) #
})
stopCluster(cl)
par(op)
# dev.off() # un-comment for pdf output
The code runs, but the output seems not to be sent to the device somehow. How could I fix that?
set.seed(42)
dat <- cbind.data.frame(expand.grid(country=replicate(9, paste(LETTERS[sample(seq(26), 2)], collapse="")),
year=2014:2019),
value=runif(54))
Upvotes: 0
Views: 100
Reputation: 1
Q : How could I fix that?
PROLOGUE :
While there are process-organisations, that may allow some steps to happen (get started, executed and terminated) in True-[PARALLEL]
fashion, there are much more cases, that simply cannot operate this way, shouldn't they even explicitly require a pure-[SERIAL]
organisation of one-step-after-another(was completed). A "just"-[CONCURRENT]
organisation of process-flow may harness some finite amount of available resources ( memory-I/O across-"just"-3-memory-channels, one-of-"just"-6-CPU-cores present, etc. ) if and only if these are free, not otherwise.
All of this, yet, comes at a cost ...( for details on add-on costs of parallelism see this in parallelism-amdahl)
THIS SAID :
Plotting is sure to be a pure-[SERIAL]
process. One step after another applies here. From printer-device, through the printer-interface, till even the printer-task ( a transaction ) interpretation of the PostScript or other printing-device control-language - all are transaction-locked ( no two documents ought get printed at once, but rather one after another, like you would never get two graphs at once on the same paper if working with just one pencil and one ruler, would you? ).
[SERIAL]
printing/plotting engine queue ( one-after-another, without waiting for data to get processed, as these were prepared beforehand )or
{.PDF|.EPS|.PS}
-plots beforehand as individual plot-files ( in parallel ( more probably in some controlled form of the "just"-[CONCURRENT]
manner - as the disk-I/O and CPU-cores are the obvious bottlenecks and some form of an I/O-latency-masking is about the maximum one may benefit from, if trying to organise more work-streams than the number of CPU-cores are in place, like an async/deferred-completion of some I/O-bound functors, if the implementation language can provide, is ) still if all the add-on costs get at least justified not to pay way more than receive ) and use the {PDF|PS}-language powers for the final document-composition, which will, for obvious reasons re-assembly the previously prepared files into one, final output, composed in a pure-[SERIAL]
manner.Upvotes: 1