bokov
bokov

Reputation: 3534

Error handling within parApply (in R, using parallel package)

I am trying to troubleshoot the following message I get when trying to use the parApply function from the parallel package:

Error in unserialize(node$con) : error reading from connection

The following is a mockup of what I'm doing:

c0<-makeCluster(16,outfile='');clusterEvalQ(c0,library(survival));
aa <- array(rexp(1e4),c(100,50,2));
bb<-parApply(c0,aa,1,function(ii) {
  oo<-try(summary(coxph(Surv(c(ii))~gl(2,50)))$coef[1,]);
  if(class(oo)[1]=='try-error') rep(NA,5) else oo
});

... except that it doesn't produce the error. The actual function I call from inside parApply is a huge one I wrote myself that is too long to try to post here. But I'm not trying to get someone to debug my function. I'm trying to find out where to look for more detailed debugging information and who/what I have to strangle to get try() to accomplish its stated purpose.

The function does work with standard apply() and with aaply(...,.parallel=FALSE) but not aaply(...,parallel=TRUE).

The only thing I see on the screen log (besides normal warning messages that accompany the loading of the packages I use) is Execution halted.

When I do stopCluster(c0) I get the following additional output:

Error in serialize(data, node$con) : ignoring SIGPIPE signal

Does anybody know where else to look? I am running R 2.15.1 on CentOS release 5.4 (Final). Are there types of errors that can propagate upward despite my attempt to catch them with try()? Is there maybe some timeout option in parallel I can set to make the worker nodes more patient?


First, I started using makeCluster(16,outfile='',type='FORK') instead of the default SOCK type cluster. This got a hell of a lot more stable, because FORK clones the entire environment without me remembering to manually export every dependency and/or because (not sure here) FORK doesn't have to send tokenized data through a loopback port?

Anyway, under some circumstances the error reading from connection would come back. I got distracted by the unfamiliar problem domain and vague error messages and forgot that the same troubleshooting heuristics apply here as always:

Turns out, as the answerer implied, try() only catches errors. An unexpected result that's the wrong data type or the wrong size or is NULL will pass right through try() and tryCatch() and crash whatever is trying to fit the result back into an array!

Thank god it wasn't some crazy non-deterministic race condition or something. Woot. Thanks for reading, hope my experience helps someone else.

Upvotes: 2

Views: 4405

Answers (1)

Steve Weston
Steve Weston

Reputation: 19677

There may be nothing wrong with your use of the try function. It may be that your function is causing a worker process to exit. In that case, the master process will get an error reading from the socket connection to that worker, resulting in the error message:

Error in unserialize(node$con) : error reading from connection

parApply doesn't catch this error, but propagates it, causing your script to exit with the message "Execution halted".

I can reproduce this scenario with:

library(parallel)
cl <- makePSOCKcluster(4)
clusterApply(cl, 1:10, function(i) {
  tryCatch({
    quit(save='no', status=1)
  },
  error=function(e) {
    NULL
  })
})

When I executed it, I get the output:

Error in unserialize(node$con) : error reading from connection
Calls: clusterApply ... FUN -> recvData -> recvData.SOCKnode -> unserialize
Execution halted

Unfortunately, this tells us nothing about what is causing a worker process to exit, but I think that's where you should focus your efforts, rather than struggling with the try function.

Upvotes: 5

Related Questions