Reputation: 238
I'm using doSNOW- package for parallelizing tasks, which differ in length. When one thread is finished, I want
It works in singlethreaded (see makeClust(spec = 1 ))
#Register Snow and doSNOW
require(doSNOW)
#CHANGE spec to 4 or more, to see what my problem is
registerDoSNOW(cl <- makeCluster(spec=1,type="SOCK",outfile=""))
numbersProcessed <- c() # init processed vector
x <- foreach(i = 1:10,.export=numbersProcessed) %dopar% {
#Do working stuff
cat(format(Sys.time(), "%X"),": ","Starting",i,"(Numbers processed so far:",numbersProcessed, ")\n")
Sys.sleep(time=i)
#Appends this number to general vector
numbersProcessed <- append(numbersProcessed,i)
cat(format(Sys.time(), "%X"),": ","Ending",i,"\n")
cat("--------------------\n")
}
#End it all
stopCluster(cl)
Now change the spec in "makeCluster" to 4. Output is something like this:
[..]
Type: EXEC
18:12:21 : Starting 9 (Numbers processed so far: 1 5 )
18:12:23 : Ending 6
--------------------
Type: EXEC
18:12:23 : Starting 10 (Numbers processed so far: 2 6 )
18:12:25 : Ending 7
At 18:12:21 thread 9 knew, that thread 1 and 5 have been processed. 2 seconds later thread 6 ends. The next thread has to know at least about 1, 5 and 6, right?. But thread 10 only knows about 6 and 2.
I realized, this has to do something with the cores specified in makeCluster. 9 knows about 1, 5 and 9 (1 + 4 + 4), 10 knows about 2,6 and 10 (2 + 4 + 4).
Is there a better way to pass "processed" stuff to further generations of threads?
Bonuspoints: Is there a way to "print" to the master- node in parallel processing, without having these "Type: EXEC" etc messages from the snow package? :)
Thanks! Marc
Upvotes: 0
Views: 531
Reputation: 238
My bad. Damn.
I thought, foreach with %dopar% is load-balanced. This isn't the case, and makes my question absolete, because there can nothing be executed on the host-side while parallel processing. That explains why global variables are only manipulated on the client side and never reach the host.
Upvotes: 1