Reputation: 137
I was reading Parallel Computing docs of Julia, and having never done any parallel coding, I was left wanting a gentler intro. So, I thought of a (probably) simple problem that I couldn't figure out how to code in parallel Julia paradigm.
Let's say I have a matrix/dataframe df
from some experiment. Its N
rows are variables, and M
columns are samples. I have a method pwCorr(..)
that calculates pairwise correlation of rows. If I wanted an NxN matrix of all the pairwise correlations, I'd probably run a for-loop that'd iterate for N*N/2
(upper or lower triangle of the matrix) and fill in the values; however, this seems like a perfect thing to parallelize since each of the pwCorr()
calls are independent of others. (Am I correct in thinking this way about what can be parallelized, and what cannot?)
To do this, I feel like I'd have to create a DArray
that gets filled by a @parallel
for loop. And if so, I'm not sure how this can be achieved in Julia. If that's not the right approach, I guess I don't even know where to begin.
Upvotes: 5
Views: 563
Reputation: 2006
This should work, first you need to propagate the top level variable (data) to all the workers:
for pid in workers()
remotecall(pid, x->(global data; data=x; nothing), data)
end
then perform the computation in chunks using the DArray constructor with some fancy indexing:
corrs = DArray((20,20)) do I
out=zeros(length(I[1]),length(I[2]))
for i=I[1], j=I[2]
if i<j
out[i-minimum(I[1])+1,j-minimum(I[2])+1]= 0.0
else
out[i-minimum(I[1])+1,j-minimum(I[2])+1] = cor(vec(data[i,:]), vec(data[j,:]))
end
end
out
end
In more detail, the DArray
constructor takes a function which takes a tuple of index ranges and returns a chunk of the resulting matrix which corresponds to those index ranges. In the code above, I
is the tuple of ranges with I[1]
being the first range. You can see this more clearly with:
julia> DArray((10,10)) do I
println(I)
return zeros(length(I[1]),length(I[2]))
end
From worker 2: (1:10,1:5)
From worker 3: (1:10,6:10)
where you can see it split the array into two chunks on the second axis.
The trickiest part of the example was converting from these 'global' index ranges to local index ranges by subtracting off the minimum element and then adding back 1 for the 1 based indexing of Julia. Hope that helps!
Upvotes: 3