How to use the device for plotting in parallel?

Question

I want to run a CPU-intensive plotting function on a bunch of countries. Therefore I attempted to parallelize my code, however, I don't get any output so far. I use x11() for tests and pdf() for the final result.

The normal code looks like this,

x11(width=7, height=7)  # comment out for pdf output
# pdf("plot.pdf")  # un-comment for pdf output
op <- par(mfrow=c(3, 3))
sapply(unique(dat$country), function(x)
  with(dat[dat$country == x, ], 
       plot(year, value, type="l",                 # complicated plotting function
            main=x,                                #
            xlim=c(2014, 2019), ylim=0:1, col=2))  #
  )
par(op)
# dev.off()  # un-comment for pdf output

with this output:

Here my attempt to parallelize.

library(parallel)
x11(width=7, height=7)  # comment out for pdf output
# pdf("plot.pdf")  # un-comment for pdf output
op <- par(mfrow=c(3, 3))
cl <- makeCluster(detectCores() - 1)
clusterExport(cl, c("dat"), envir=environment())
parSapply(cl, 1:length(unique(dat$country)), function(x, ...) {
  with(dat[dat$country == x, ], 
       plot(year, value, type="l",                 # complicated plotting function
            main=unique(dat$country)[x],           #
            xlim=c(2014, 2019), ylim=0:1, col=2))  #
  })
stopCluster(cl)
par(op)
# dev.off()  # un-comment for pdf output

The code runs, but the output seems not to be sent to the device somehow. How could I fix that?

Data

set.seed(42)
dat <- cbind.data.frame(expand.grid(country=replicate(9, paste(LETTERS[sample(seq(26), 2)], collapse="")),
                                    year=2014:2019),
                        value=runif(54))

user3666197 · Accepted Answer

Q : How could I fix that?

PROLOGUE :
While there are process-organisations, that may allow some steps to happen (get started, executed and terminated) in True-[PARALLEL] fashion, there are much more cases, that simply cannot operate this way, shouldn't they even explicitly require a pure-[SERIAL] organisation of one-step-after-another(was completed). A "just"-[CONCURRENT] organisation of process-flow may harness some finite amount of available resources ( memory-I/O across-"just"-3-memory-channels, one-of-"just"-6-CPU-cores present, etc. ) if and only if these are free, not otherwise.

All of this, yet, comes at a cost ..._{( for details on add-on costs of parallelism see this in parallelism-amdahl)}

THIS SAID :
Plotting is sure to be a pure-[SERIAL] process. One step after another applies here. From printer-device, through the printer-interface, till even the printer-task ( a transaction ) interpretation of the PostScript or other printing-device control-language - all are transaction-locked ( no two documents ought get printed at once, but rather one after another, like you would never get two graphs at once on the same paper if working with just one pencil and one ruler, would you? ).

Options ?

May prepare all the data beforehand ( in parallel, if all the such re-factoring associated with the re-organised flow of processing add-on costs get at least justified not to pay way more than receive ( so that effective speedup did not get <= 1 ) ) and then feed these pre-baked print-ready data into the principally pure-[SERIAL] printing/plotting engine queue ( one-after-another, without waiting for data to get processed, as these were prepared beforehand )

or

May prepare all the {.PDF|.EPS|.PS}-plots beforehand as individual plot-files ( in parallel ( more probably in some controlled form of the "just"-[CONCURRENT] manner - as the disk-I/O and CPU-cores are the obvious bottlenecks and some form of an I/O-latency-masking is about the maximum one may benefit from, if trying to organise more work-streams than the number of CPU-cores are in place, like an async/deferred-completion of some I/O-bound functors, if the implementation language can provide, is ) still if all the add-on costs get at least justified not to pay way more than receive ) and use the {PDF|PS}-language powers for the final document-composition, which will, for obvious reasons re-assembly the previously prepared files into one, final output, composed in a pure-[SERIAL] manner.

How to use the device for plotting in parallel?

Data

Answers (1)

Options ?

Related Questions