Robert Kubrick
Robert Kubrick

Reputation: 8733

How does snow distribute list elements to workers?

How many list elements are sent to each worker process when calling parLapply()? For example, let's say we have a list of 6 elements and 2 workers on a snow SOCK cluster. Does parLapply() sends two list elements to each worker in one send call, or does it send one element per send?

I want to minimize my cluster communication overhead (I have many list elements that can be processed relatively quickly by each CPU) and from what I see on the htop CPU meters it looks like snow it's sending one list element at the time. Is it possible to set the number of list elements dispatched in one send call?

Upvotes: 2

Views: 307

Answers (1)

Steve Weston
Steve Weston

Reputation: 19677

The parLapply function splits the input into one chunk per worker. It does that with the splitList function, as seen in the implentation of parLapply:

function (cl = NULL, X, fun, ...) 
  do.call(c, clusterApply(cl, x = splitList(X, length(cl)), fun = lapply,
                          fun, ...), quote = TRUE)

So with a list of 6 elements and 2 workers, it will send 3 elements to each worker with a single "send" operation per worker. This is similar to the behavior of mclapply with mc.preschedule set to TRUE (the default value).

So it seems that parLapply is already performing the optimization that you want.

It's interesting to note that by simply changing lapply to mclapply in the definition of parLapply, you can create a hybrid parallel programming function that might work quite well with nodes that have many cores.

Upvotes: 5

Related Questions