Minotaur
Minotaur

Reputation: 1

How to efficiently parallelize accumulation task in Julia using @threads?

I'm new to multithreading. I am trying to do an accumulation task in Julia, similar to the following:

a = collect(1:5)
b = 0.0
for i = 1:5
    b += a[6-i] + i
end

I would like to parallelize this using @threads, but when I do

a = collect(1:5)
b = 0.0
Threads.@threads for i = 1:5
    b += a[6-i] + i
end

I get the following error:

ERROR: TaskFailedException Stacktrace: [1] wait @ ./task.jl:345 [inlined] [2] threading_run(fun::var"#263#threadsfor_fun#57"{var"#263#threadsfor_fun#56#58"{UnitRange{Int64}}}, static::Bool) Base.Threads ./threadingconstructs.jl:38 [3] top-level scope @ ./threadingconstructs.jl:89

nested task error: UndefVarError: b not defined Stacktrace: [1] macro expansion @ ./REPL[35]:2 [inlined] [2] #263#threadsfor_fun#56 @ ./threadingconstructs.jl:84 [inlined] [3] #263#threadsfor_fun @ ./threadingconstructs.jl:51 [inlined] [4] (::Base.Threads.var"#1#2"{var"#263#threadsfor_fun#57"{var"#263#threadsfor_fun#56#58"{UnitRange{Int64}}}, Int64})() Base.Threads ./threadingconstructs.jl:30

What is the correct way to go about parallelizing this task? All of the math I will be doing is isolated and lends itself to multithreading, but I ultimately need to accumulate the output from the threads into a single variable.

I have tried making a vector as long as the number of computations I will be doing, but I also need this method to scale so this doesn't seem like the right solution.

Another idea I came across in this StackOverflow post was to make a vector equal in length to the number of threads and index the vector using Threads.nthreads(). This seems to work fine, but I still wanted to see if this is a suitable method or if I should consider other things.

Upvotes: 0

Views: 302

Answers (2)

mcabbott
mcabbott

Reputation: 2590

There are also various packages which will do threaded reduction, for example:

julia> using Tullio  # mine, recursively sub-divides 1:5

julia> @tullio b := a[6-i] + i
30

julia> using ThreadsX  # TKF, built on folds

julia> ThreadsX.sum(1:5) do i
         a[6-i] + i
       end
30

julia> using LoopVectorization  # re-writes loop, using Polyester.jl

julia> let b = 0.0
         @tturbo for i = 1:5
           b += a[6-i] + i
         end
         b
       end
30.0

Upvotes: 0

Bill
Bill

Reputation: 6086

You can split the sums into subsums on a per-thread basis with Threads.nthreads() and Threads.threadid() to avoid clashes on the b variable:

julia> barray = zeros(Int, Threads.nthreads())
6-element Vector{Int64}:
0
0
0
0
0
0

julia> @Threads.threads for i = 1:5
        barray[Threads.threadid()] += a[6-i] + i
    end

julia> b = sum(barray)
30

NB: This works because the subsequent sums do not depend on the prior ones. If this condition of independence between threads is not met (it is not generally met with the accumulate() function) this method does not work, and you would need a lock and probably would not gain as much by threading.

Upvotes: 0

Related Questions