Robert Kubrick
Robert Kubrick

Reputation: 8733

Distributing lists on a snow cluster

The snow package parXapply() functions distribute work very well when data is only contained in one list or matrix, but in this case I need to run a function on four different types of matrices.

For example, this is what I have now:

res.list = parLapply(cl, mynames, myfun, listA, listB, listC, listD)

myfun = function(name, listA, listB, listC, listD) {
  matrixA = listA[[name]]
  matrixB = listB[[name]]
  matrixC = listC[[name]]
  matrixD = listD[[name]]
}

The problem I am having is that the matrices are very large and I suspect calling parLapply() on the full lists involves transferring all the data to each cluster node. This can be very time-consuming and reduces the cluster performances.

How can I split the lists before calling myfun() and only send the relevant matrices to each node for processing?

Upvotes: 2

Views: 948

Answers (2)

TedPavlic
TedPavlic

Reputation: 231

I think the answer given by Robert Kubrick best answers this question using clusterMap. However, I think other people searching for an answer to a related question may benefit from another option -- mcmapply (which is the multi-core version of mapply). For example:

mcmapply(rep, 1:4, 4:1)

mcmapply implements a parallel mapply using forking, which means that it is not an option on Windows machines. Moreover, there can be complications if you're running R within a GUI. By the way, there is also an mclapply that is the multi-core version of lapply.

So mcmapply and mclapply are the simplest versions of what you might otherwise expect to be called parMapply and parLapply.

Upvotes: 0

Robert Kubrick
Robert Kubrick

Reputation: 8733

clusterMap() does the job:

res.list = clusterMap(cl, myfun, mynames, listA, listB, listC, listD)

Somehow the parMapply() wrapper was left out of the package.

Upvotes: 5

Related Questions