Reputation: 141
Is there a good way to do parallel matrix multiplication in julia? I tried using DArrays, but it was significantly slower than just a single-thread multiplication.
Upvotes: 6
Views: 3081
Reputation: 12179
It sounds like you're interested in dense matrices, in which case see the other answers. Should you be (or become) interested in sparse matrices, see https://github.com/madeleineudell/ParallelSparseMatMul.jl.
Upvotes: 5
Reputation: 2518
The problem is most likely that direct (maybe single-threaded) matrix-multiplication is normally performed with an optimized library function. In the case of OpenBLAS, this is already multithreaded. For arrays with size 2000x2000
, the simple matrixmultiplication
@time c = sa * sb;
results in 0.3 seconds multithreaded and 0.7 seconds singlethreaded.
Splitting of a single dimension in multiplication the times get even worse and reach around 17 seconds in singlethreaded mode.
@time for j = 1:n
sc[:,j] = sa[:,:] * sb[:,j]
end
The solution to your problem might be the use of shared arrays, which share the same data across your processes on a single computer. Please note that shared arrays are still marked as experimental.
# create shared arrays and initialize them with random numbers
sa = SharedArray(Float64,(n,n),init = s -> s[localindexes(s)] = rand(length(localindexes(s))))
sb = SharedArray(Float64,(n,n),init = s -> s[localindexes(s)] = rand(length(localindexes(s))))
sc = SharedArray(Float64,(n,n));
Then you have to create a function, which performs a cheap matrix multiplication on a subset of the matrix.
@everywhere function mymatmul!(n,w,sa,sb,sc)
# works only for 4 workers and n divisible by 4
range = 1+(w-2) * div(n,4) : (w-1) * div(n,4)
sc[:,range] = sa[:,:] * sb[:,range]
end
Finally, the main process tells the workers to work on their part.
@time @sync begin
for w in workers()
@async remotecall_wait(w, mymatmul!, n, w, sa, sb, sc)
end
end
which takes around 0.3 seconds
which is the same time as the multithreaded single-process time.
Upvotes: 5
Reputation: 11664
Parallel in what sense? If you mean single-machine, multi-threaded, then Julia does this by default as OpenBLAS (the underlying linear algebra library used) is multithreaded.
If you mean multiple-machine, distributed-computing-style, then you will be encountering a lot of communications overhead that will only be worth it for very large problems, and a customized approach might be needed.
Upvotes: 6