Reputation: 11728
Trying this code
library(future)
library(foreach)
ncores <- 3
cl <- parallel::makeCluster(3)
avail <- bigstatsr::FBM(ncores, 1, type = "integer", init = 1)
doFuture::registerDoFuture()
res <- vector("list", 5)
for (i in seq_along(res)) {
while (sum(avail[]) == 0) {
cat("Waiting..\n")
Sys.sleep(0.5)
}
ind.avail <- which(avail[] == 1)
cat("Available:", length(ind.avail), "\n")
plan(cluster, workers = cl[ind.avail])
foo <- foreach(i = 3:1) %dopar% {
Sys.sleep(i)
}
print(one <- ind.avail[1])
avail[one] <- 0; print(avail[])
res[[i]] <- cluster(workers = cl[one], {
Sys.sleep(5)
avail[one] <- 1
i
})
}
sapply(res, resolved)
parallel::stopCluster(cl)
Error I get: Initialization of plan() failed, because the test future used for validation failed. The reason was: Unexpected result (of class ‘NULL’ != ‘FutureResult’) retrieved for ClusterFuture future (label = ‘<none>’, expression = ‘NA’)
.
Explanation of my example trying to reproduce my real problem:
So my idea was to parallelize the first step over all available clusters and to run the second step asynchronously using one cluster only. This cluster would not be available anymore until this asynchronous job is finished. Then the next first step would have one less cluster available and so on. When there is no available cluster anymore for the first step, it would wait for some asynchronous job to finish and to release some cluster.
Upvotes: 2
Views: 1175
Reputation: 6805
I can reproduce this. I believe you are managing to corrupt the communication with the main R process and the cluster node by calling plan()
with a cluster node that holds results of a future that have not yet been brought back to the main R process. (I tried to come up with a simpler example of this type of corruption, but it's not obvious without spending much more time.)
The future framework already detects this (hence the error). I've updated the develop version of future to give some more clues and evidence on what is happening:
Error: Initialization of plan() failed, because the test future used for
validation failed. The reason was: Unexpected result (of class ‘character’
!= ‘FutureResult’) retrieved for ClusterFuture future (label =
‘future-plan-test’, expression = ‘NA’): future-grmall. This suggests that
the communication with ClusterFuture worker (‘SOCKnode’ #1) is out of sync.
I think you can get around this by making sure that you collect the value of the resolved futures before re-using their workers again. The plan(cluster, ...)
call validates that at least one future can be successfully resolved.
Upvotes: 1