Wolf
Wolf

Reputation: 505

Extract separate grouped DataFrames from another DataFrame

What is the best/most idiomatic way to extract groups into separate DataFrames? This would be useful in many ways (e.g. training separate models for each group, data visualization of each group, saving specific subsets of the data etc.)

A minimal example of such a problem could be:

using DataFrames
df = DataFrame(Dict(:groups => ["A", "B", "A"],
                    :val1 => [1, 2, -4],
                    :val2 => [3, 9, 1]))

The ideal output would be something like:

group_dict = Dict("A" => DataFrame(Dict(:val1 => [1, -4], :val2 => [3, 1])),
                  "B" => DataFrame(Dict(:val1 => [2], :val2 => [9])))

A clean solution would be leveraging Query.jl's @groupby:

using Query
df |> @groupby(_.groups) |> ?? |> Dict

However, I'm stuck on the last step (i.e. turning this into a dictionary or some other named collection).

Upvotes: 1

Views: 111

Answers (2)

Bill
Bill

Reputation: 6086

The following:

using DataFrames
df = DataFrame(Dict(:groups => ["A", "B", "A"],
                    :val1 => [1, 2, -4],
                    :val2 => [3, 9, 1]))

dict = Dict([letter => df[df[!, :groups] .== letter, 2:3]
    for letter in unique(df[!, :groups])])

println(dict)

yields

Dict(
"B" => 1×2 DataFrame
│ Row │ val1  │ val2  │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 2     │ 9     │,

"A" => 2×2 DataFrame
│ Row │ val1  │ val2  │
│     │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1   │ 1     │ 3     │
│ 2   │ -4    │ 1     │)

which looks like what you needed. Note you need unique() in the comprehension statement because unlike the dataframe groupby, the Dict has the restriction that it needs the :groups to be unique as keys.

Upvotes: 1

Anshul Singhvi
Anshul Singhvi

Reputation: 1742

You can use DataFrames.groupby(df,:groups) to return a GroupedDataFrame, which is a collection of SubDataFrames.

Upvotes: 3

Related Questions