Reputation: 1
I'm new to multithreading. I am trying to do an accumulation task in Julia, similar to the following:
a = collect(1:5)
b = 0.0
for i = 1:5
b += a[6-i] + i
end
I would like to parallelize this using @threads, but when I do
a = collect(1:5)
b = 0.0
Threads.@threads for i = 1:5
b += a[6-i] + i
end
I get the following error:
ERROR: TaskFailedException Stacktrace: [1] wait @ ./task.jl:345 [inlined] [2] threading_run(fun::var"#263#threadsfor_fun#57"{var"#263#threadsfor_fun#56#58"{UnitRange{Int64}}}, static::Bool) Base.Threads ./threadingconstructs.jl:38 [3] top-level scope @ ./threadingconstructs.jl:89
nested task error: UndefVarError: b not defined Stacktrace: [1] macro expansion @ ./REPL[35]:2 [inlined] [2] #263#threadsfor_fun#56 @ ./threadingconstructs.jl:84 [inlined] [3] #263#threadsfor_fun @ ./threadingconstructs.jl:51 [inlined] [4] (::Base.Threads.var"#1#2"{var"#263#threadsfor_fun#57"{var"#263#threadsfor_fun#56#58"{UnitRange{Int64}}}, Int64})() Base.Threads ./threadingconstructs.jl:30
What is the correct way to go about parallelizing this task? All of the math I will be doing is isolated and lends itself to multithreading, but I ultimately need to accumulate the output from the threads into a single variable.
I have tried making a vector
as long as the number of computations I will be doing, but I also need this method to scale so this doesn't seem like the right solution.
Another idea I came across in this StackOverflow post was to make a vector equal in length to the number of threads and index the vector using Threads.nthreads()
. This seems to work fine, but I still wanted to see if this is a suitable method or if I should consider other things.
Upvotes: 0
Views: 302
Reputation: 2590
There are also various packages which will do threaded reduction, for example:
julia> using Tullio # mine, recursively sub-divides 1:5
julia> @tullio b := a[6-i] + i
30
julia> using ThreadsX # TKF, built on folds
julia> ThreadsX.sum(1:5) do i
a[6-i] + i
end
30
julia> using LoopVectorization # re-writes loop, using Polyester.jl
julia> let b = 0.0
@tturbo for i = 1:5
b += a[6-i] + i
end
b
end
30.0
Upvotes: 0
Reputation: 6086
You can split the sums into subsums on a per-thread basis with Threads.nthreads()
and Threads.threadid()
to avoid clashes on the b variable:
julia> barray = zeros(Int, Threads.nthreads())
6-element Vector{Int64}:
0
0
0
0
0
0
julia> @Threads.threads for i = 1:5
barray[Threads.threadid()] += a[6-i] + i
end
julia> b = sum(barray)
30
NB: This works because the subsequent sums do not depend on the prior ones. If this condition of independence between threads is not met (it is not generally met with the accumulate() function) this method does not work, and you would need a lock and probably would not gain as much by threading.
Upvotes: 0