Akon
Akon

Reputation: 113

Improving the speed of a for loop in Julia

Here is my code in Julia platform and I like to speed it up. Is there anyway that I can make this faster? It takes 0.5 seconds for a dataset of 50k*50k. I was expecting Julia to be a lot faster than this or I am not sure if I am doing a silly implementation.

ar = [[1,2,3,4,5], [2,3,4,5,6,7,8], [4,7,8,9], [9,10], [2,3,4,5]]

SV = rand(10,5)

function h_score_0(ar ,SV)
    m = length(ar)
    SC = Array{Float64,2}(undef, size(SV, 2), m)
    for iter = 1:m
        nodes = ar[iter]
        for jj = 1:size(SV, 2)
            mx = maximum(SV[nodes, jj])
            mn = minimum(SV[nodes, jj])
            term1 = (mx - mn)^2;
            SC[jj, iter] = (term1);
        end
    end
    return score = sum(SC, dims = 1)
end

Upvotes: 1

Views: 2202

Answers (1)

DNF
DNF

Reputation: 12644

You have some unnecessary allocations in your code:

mx = maximum(SV[nodes, jj])
mn = minimum(SV[nodes, jj])

Slices allocate, so each line makes a copy of the data here, you're actually copying the data twice, once on each line. You can either make sure to copy only once, or even better: use view, so there is no copy at all (note that view is much faster on Julia v1.5, in case you are using an older version).

SC = Array{Float64,2}(undef, size(SV, 2), m)

And no reason to create a matrix here, and sum over it afterwards, just accumulate while you are iterating:

score[i] += (mx - mn)^2

Here's a function that is >5x as fast on my laptop for the input data you specified:

function h_score_1(ar, SV)
    score = zeros(eltype(SV), length(ar))
    @inbounds for i in eachindex(ar)
        nodes = ar[i]
        for j in axes(SV, 2)
            SVview = view(SV, nodes, j)
            mx = maximum(SVview)
            mn = minimum(SVview)
            score[i] += (mx - mn)^2
        end
    end
    return score
end

This function outputs a one-dimensional vector instead of a 1xN matrix in your original function.

In principle, this could be even faster if we replace

mx = maximum(SVview)
mn = minimum(SVview)

with

(mn, mx) = extrema(SVview)

which only traverses the vector once, instead of twice. Unfortunately, there is a performance issue with extrema, so it is currently not as fast as separate maximum/minimum calls: https://github.com/JuliaLang/julia/issues/31442

Finally, for absolutely getting the best performance at the cost of brevity, we can avoid creating a view at all and turn the calls to maximum and minimum into a single explicit loop traversal:

function h_score_2(ar, SV)
    score = zeros(eltype(SV), length(ar))
    @inbounds for i in eachindex(ar)
        nodes = ar[i]
        for j in axes(SV, 2)
            mx, mn = -Inf, +Inf
            for node in nodes
                x = SV[node, j]
                mx = ifelse(x > mx, x, mx)
                mn = ifelse(x < mn, x, mn)
            end
            score[i] += (mx - mn)^2
        end
    end
    return score
end

This also avoids the performance issue that extrema suffers, and looks up the SV element once per node. Although this version is annoying to write, it's substantially faster, even on Julia 1.5 where views are free. Here are some benchmark timings with your test data:

julia> using BenchmarkTools

julia> @btime h_score_0($ar, $SV)
  2.344 μs (52 allocations: 6.19 KiB)
1×5 Matrix{Float64}:
 1.95458  2.94592  2.79438  0.709745  1.85877

julia> @btime h_score_1($ar, $SV)
  392.035 ns (1 allocation: 128 bytes)
5-element Vector{Float64}:
 1.9545848011260765
 2.9459235098820167
 2.794383144368953
 0.7097448590904598
 1.8587691646610984

julia> @btime h_score_2($ar, $SV)
  118.243 ns (1 allocation: 128 bytes)
5-element Vector{Float64}:
 1.9545848011260765
 2.9459235098820167
 2.794383144368953
 0.7097448590904598
 1.8587691646610984

So explicitly writing out the innermost loop is worth it here, reducing time by another 3x or so. It's annoying that the Julia compiler isn't yet able to generate code this efficient, but it does get smarter with every version. On the other hand, the explicit loop version will be fast forever, so if this code is really performance critical, it's probably worth writing it out like this.

Upvotes: 7

Related Questions