runr
runr

Reputation: 1146

output of doParallel workers

Suppose I have a code that does various calculations in a foreach loop, for simplicity let it be this:

foreach( idx = 1:10 ) %dopar% {
   set.seed(idx)
   smp <- rnorm(10)
   print( sprintf('Index: %d', idx) )
   print( cat(sprintf('Value: %f \n', smp[1:10])) )
}

The main idea of the code is that its output is written in such way, that the index is presented once, after which follows various results of the main code. It is then easy to read a log file and debug with such structure when using %do%, however with %dopar% the log file gets cluttered.

Question: Without changing the code, is there a way for each worker to create its own output file? I'm new in working with doParallel, so the only way of parallel output I've found so far is using the following:

library(doParallel)
cl <- makeCluster(4, outfile = 'log.txt')
registerDoParallel(cl)

However, outfile takes only a string as an argument, haven't found a way of passing a vector of names to it. So, maybe there is a way to specify an output file for each worker?

NB: Working on Windows 7.

Upvotes: 1

Views: 2071

Answers (1)

CL.
CL.

Reputation: 14997

Many solutions – like not using print in the workers or specifying the output file there – come to mind, but the question explicitly requests for not changing the code. I therefore assume that the code that's currently in the %dopar% body is wrapped in a function that cannot be changed:

doNotTouch <- function(idx) {
  set.seed(idx)
  smp <- rnorm(10)
  print(sprintf('Index: %d', idx))
  cat(sprintf('Value: %f \n', smp[1:10]))
}

One solution is to specify the output file directly before doNotTouch is called (or more precisely: execture the function in an environment where the output is redirected to a certain file):

library(doParallel)
cl <- makeCluster(4)
registerDoParallel(cl)


foreach(idx = 1:10) %dopar% {
  capture.output(
    doNotTouch(idx), 
    file = paste0("log", idx, ".txt"))
}

stopCluster(cl)

This doesn't change the code presented in the question but rather puts it in a wrapper that takes care of the correct output file log_[idx].txt.

EDIT: In the comments, it was clarified that the question is less about writing to separate files but rather about getting output that is not cluttered. In this case, return(capture.output(doNotTouch(idx))) could be used in the loop to collect outputs belonging together and returning them at once. Moreover, @Nutle found out that Sys.getpid() returns a worker's PID which can be used to write separate logfiles by worker (as oppesed to by iteration).

Upvotes: 1

Related Questions