Mizzle
Mizzle

Reputation: 757

Julia parallel computing for loop

I would like to calculate the summation of elements from a large upper triangular matrix. The regular Julia code is below.

function upsum(M); n = size(M)[1]; sum = 0
    for i = 1:n-1 for j = i+1:n
        sum = sum + M[i,j]
        end
    end
    return sum
end

R = randn(10000,10000)
upsum(R)

Since the matrix is very large, I would like to know is there anyway to improve the speed. How can I use parallel computing here?

Upvotes: 2

Views: 1212

Answers (1)

Bogumił Kamiński
Bogumił Kamiński

Reputation: 69819

I would use threads not parallel processing in this case. Here is an example code:

using Base.Threads

function upsum_threads(M)
    n = size(M, 1)
    chunks = nthreads()
    sums = zeros(eltype(M), chunks)
    chunkend = round.(Int, n * sqrt.((1:chunks) ./ chunks))
    @assert minimum(diff(chunkend)) > 0
    chunkstart = [2; chunkend[1:end-1] .+ 1]
    @threads for job in 1:chunks
        s = zero(eltype(M))
        for i in chunkstart[job]:chunkend[job]
            @simd for j in 1:(i-1)
                @inbounds s += M[j, i]
            end
        end
        sums[job] = s
    end
    return sum(sums)
end

R = randn(10000,10000)
upsum_threads(R)

It should give you a significant speedup (even if you remove @threads it should be much faster).

You choose number of threads Julia uses by setting JULIA_NUM_THREADS environment variable.

Upvotes: 4

Related Questions