Reputation: 158
I have a GroupedDataFrame
in Julia 1.4 (DataFrames 0.22.1). I want to iterate over the groups of rows to compute some statistics. Because there are many groups and the computations are slow, I want to do this multithreaded.
The code
grouped_rows = groupby(data, by_index)
for group in grouped_rows
# do something with `group`
end
works, but
grouped_rows = groupby(data, by_index)
Threads.@threads for group in grouped_rows
# do something with `group`
end
results in MethodError: no method matching firstindex(::GroupedDataFrame{DataFrame})
. Is there a way to parallelize the iteration over groups of DataFrame rows?
Upvotes: 4
Views: 504
Reputation: 42214
You need to have an AbstractVector
for Threads.@threads
to work.
Hence collect your grouped_rows
Threads.@threads for group in collect(SubDataFrame, grouped_rows)
# do something with `group`
end
Upvotes: 4