Danny
Danny

Reputation: 21

Synchronously Outputting to DistributedArray of Vectors in Parallel

I'm trying to distribute a function that outputs a vector into an array.

I followed this post with something like the following code:

a = distribute([Float64[] for _ in 1:nrow(df)])
@sync @distributed for i in 1:nrow(df)
  append!(localpart(a)[i], foo(df[i]))
end

But I get the following error:

BoundsError: attempt to access 145-element Vector{Vector{Float64}} at index [147]

I've only ever parallelized with SharedArrays, which aren't an option, since I need to store vectors in the shared array. Any and all advice would be life-saving.

Upvotes: 2

Views: 46

Answers (1)

Przemyslaw Szufel
Przemyslaw Szufel

Reputation: 42244

Each localpart is indexed starting from one. Hence you need to convert between a global index and the local index.

The function DistributedArrays.localindices() returns a one element tuple that contains a range of global indices that are mapped to the localpart. This information can be in turn used for the index conversion:

@sync @distributed for i in 1:nrow(df)
    id = i - DistributedArrays.localindices(a)[1][1] + 1
    push!(localpart(a)[id], f(df[i,1]))
end

EDIT

to understand how localindiced work look at this code:

julia> addprocs(4);

julia>  a = distribute([Float64[] for _ in 1:14]);

julia> fetch(@spawnat 2 DistributedArrays.localindices(a))
(1:4,)

julia> fetch(@spawnat 3 DistributedArrays.localindices(a))
(5:8,)

julia> fetch(@spawnat 4 DistributedArrays.localindices(a))
(9:11,)

julia> fetch(@spawnat 5 DistributedArrays.localindices(a))
(12:14,)

Upvotes: 1

Related Questions