Omar Abudayyeh
Omar Abudayyeh

Reputation: 141

How do you do parallel matrix multiplication in Julia?

Is there a good way to do parallel matrix multiplication in julia? I tried using DArrays, but it was significantly slower than just a single-thread multiplication.

Upvotes: 6

Views: 3081

Answers (3)

tholy
tholy

Reputation: 12179

It sounds like you're interested in dense matrices, in which case see the other answers. Should you be (or become) interested in sparse matrices, see https://github.com/madeleineudell/ParallelSparseMatMul.jl.

Upvotes: 5

Stefan
Stefan

Reputation: 2518

The problem is most likely that direct (maybe single-threaded) matrix-multiplication is normally performed with an optimized library function. In the case of OpenBLAS, this is already multithreaded. For arrays with size 2000x2000, the simple matrixmultiplication

@time c = sa * sb;

results in 0.3 seconds multithreaded and 0.7 seconds singlethreaded.

Splitting of a single dimension in multiplication the times get even worse and reach around 17 seconds in singlethreaded mode.

@time for j = 1:n
    sc[:,j] = sa[:,:] * sb[:,j]
end

shared arrays

The solution to your problem might be the use of shared arrays, which share the same data across your processes on a single computer. Please note that shared arrays are still marked as experimental.

# create shared arrays and initialize them with random numbers
sa = SharedArray(Float64,(n,n),init = s -> s[localindexes(s)] = rand(length(localindexes(s))))
sb = SharedArray(Float64,(n,n),init = s -> s[localindexes(s)] = rand(length(localindexes(s))))
sc = SharedArray(Float64,(n,n));

Then you have to create a function, which performs a cheap matrix multiplication on a subset of the matrix.

@everywhere function mymatmul!(n,w,sa,sb,sc)
    # works only for 4 workers and n divisible by 4
    range = 1+(w-2) * div(n,4) : (w-1) * div(n,4)
    sc[:,range] = sa[:,:] * sb[:,range]
end

Finally, the main process tells the workers to work on their part.

@time @sync begin
    for w in workers()
        @async remotecall_wait(w, mymatmul!, n, w, sa, sb, sc)
    end
end

which takes around 0.3 seconds which is the same time as the multithreaded single-process time.

Upvotes: 5

IainDunning
IainDunning

Reputation: 11664

Parallel in what sense? If you mean single-machine, multi-threaded, then Julia does this by default as OpenBLAS (the underlying linear algebra library used) is multithreaded.

If you mean multiple-machine, distributed-computing-style, then you will be encountering a lot of communications overhead that will only be worth it for very large problems, and a customized approach might be needed.

Upvotes: 6

Related Questions