Reputation: 8733
The snow package parXapply() functions distribute work very well when data is only contained in one list or matrix, but in this case I need to run a function on four different types of matrices.
For example, this is what I have now:
res.list = parLapply(cl, mynames, myfun, listA, listB, listC, listD)
myfun = function(name, listA, listB, listC, listD) {
matrixA = listA[[name]]
matrixB = listB[[name]]
matrixC = listC[[name]]
matrixD = listD[[name]]
}
The problem I am having is that the matrices are very large and I suspect calling parLapply() on the full lists involves transferring all the data to each cluster node. This can be very time-consuming and reduces the cluster performances.
How can I split the lists before calling myfun() and only send the relevant matrices to each node for processing?
Upvotes: 2
Views: 948
Reputation: 231
I think the answer given by Robert Kubrick best answers this question using clusterMap
. However, I think other people searching for an answer to a related question may benefit from another option -- mcmapply
(which is the multi-core version of mapply
). For example:
mcmapply(rep, 1:4, 4:1)
mcmapply
implements a parallel mapply
using forking, which means that it is not an option on Windows machines. Moreover, there can be complications if you're running R within a GUI. By the way, there is also an mclapply
that is the multi-core version of lapply
.
So mcmapply
and mclapply
are the simplest versions of what you might otherwise expect to be called parMapply
and parLapply
.
Upvotes: 0
Reputation: 8733
clusterMap() does the job:
res.list = clusterMap(cl, myfun, mynames, listA, listB, listC, listD)
Somehow the parMapply() wrapper was left out of the package.
Upvotes: 5