Reputation: 505
What is the best/most idiomatic way to extract groups into separate DataFrames
? This would be useful in many ways (e.g. training separate models for each group, data visualization of each group, saving specific subsets of the data etc.)
A minimal example of such a problem could be:
using DataFrames
df = DataFrame(Dict(:groups => ["A", "B", "A"],
:val1 => [1, 2, -4],
:val2 => [3, 9, 1]))
The ideal output would be something like:
group_dict = Dict("A" => DataFrame(Dict(:val1 => [1, -4], :val2 => [3, 1])),
"B" => DataFrame(Dict(:val1 => [2], :val2 => [9])))
A clean solution would be leveraging Query.jl
's @groupby
:
using Query
df |> @groupby(_.groups) |> ?? |> Dict
However, I'm stuck on the last step (i.e. turning this into a dictionary or some other named collection).
Upvotes: 1
Views: 111
Reputation: 6086
The following:
using DataFrames
df = DataFrame(Dict(:groups => ["A", "B", "A"],
:val1 => [1, 2, -4],
:val2 => [3, 9, 1]))
dict = Dict([letter => df[df[!, :groups] .== letter, 2:3]
for letter in unique(df[!, :groups])])
println(dict)
yields
Dict(
"B" => 1×2 DataFrame
│ Row │ val1 │ val2 │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 2 │ 9 │,
"A" => 2×2 DataFrame
│ Row │ val1 │ val2 │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 3 │
│ 2 │ -4 │ 1 │)
which looks like what you needed. Note you need unique() in the comprehension statement because unlike the dataframe groupby, the Dict has the restriction that it needs the :groups to be unique as keys.
Upvotes: 1
Reputation: 1742
You can use DataFrames.groupby(df,:groups)
to return a GroupedDataFrame, which is a collection of SubDataFrames.
Upvotes: 3