hooman
hooman

Reputation: 35

How to convert a GroupedDataFrame to a DataFrame in Julia?

I have performed calculations on subsets of a DataFrame by using the groupby function:

using RDatasets
iris = dataset("datasets", "iris")
describe(iris)
iris_grouped = groupby(iris,:Species)
iris_avg = map(:SepalLength => mean,iris_grouped::GroupedDataFrame)

Now I would like to plot the results, but I get an error message for the following plot:

@df iris_avg bar(:Species,:SepalLength)

Only tables are supported

What would be the best way to plot the data? My idea would be to create a single DataFrame and go from there. How would I do this, ie how do I convert a GroupedDataFrame to a single DataFrame? Thanks!

Upvotes: 2

Views: 668

Answers (2)

kevbonham
kevbonham

Reputation: 1040

I think you might be better off using the by function to get to your iris_avg directly. by iterates through a DataFrame, and then applies the given function to the the results. Often, it's used with a do block.

julia> by(iris, :Species) do df
           DataFrame(sepal_mean = mean(df.SepalLength))
       end
3×2 DataFrame
│ Row │ Species      │ sepal_mean │
│     │ Categorical… │ Float64    │
├─────┼──────────────┼────────────┤
│ 1   │ setosa       │ 5.006      │
│ 2   │ versicolor   │ 5.936      │
│ 3   │ virginica    │ 6.588      │

Or equivalently,

julia> by(iris, :Species, SepalLength_mean = :SepalLength => mean)
3×2 DataFrame
│ Row │ Species      │ SepalLength_mean │
│     │ Categorical… │ Float64          │
├─────┼──────────────┼──────────────────┤
│ 1   │ setosa       │ 5.006            │
│ 2   │ versicolor   │ 5.936            │
│ 3   │ virginica    │ 6.588            │

See here for more details/examples.

Alternatively, you can do it in several steps as you've done, then use DataFrame constructor to convert to a proper DataFrame:

julia> iris_grouped = groupby(iris,:Species);

julia> iris_avg = map(:SepalLength => mean,iris_grouped::GroupedDataFrame);

julia> DataFrame(iris_avg)
3×2 DataFrame
│ Row │ Species      │ SepalLength_mean │
│     │ Categorical… │ Float64          │
├─────┼──────────────┼──────────────────┤
│ 1   │ setosa       │ 5.006            │
│ 2   │ versicolor   │ 5.936            │
│ 3   │ virginica    │ 6.588            │

Upvotes: 4

Bogumił Kamiński
Bogumił Kamiński

Reputation: 69949

To convert GroupedDataFrame into a DataFrame just call DataFrame on it, e.g.:

julia> DataFrame(iris_avg)
3×2 DataFrame
│ Row │ Species      │ SepalLength_mean │
│     │ Categorical… │ Float64          │
├─────┼──────────────┼──────────────────┤
│ 1   │ setosa       │ 5.006            │
│ 2   │ versicolor   │ 5.936            │
│ 3   │ virginica    │ 6.588            │

in your case.

You could also have written:

julia> combine(:SepalLength => mean, iris_grouped)
3×2 DataFrame
│ Row │ Species      │ SepalLength_mean │
│     │ Categorical… │ Float64          │
├─────┼──────────────┼──────────────────┤
│ 1   │ setosa       │ 5.006            │
│ 2   │ versicolor   │ 5.936            │
│ 3   │ virginica    │ 6.588            │

on an original GroupedDataFrame or

julia> by(:SepalLength => mean, iris, :Species)
3×2 DataFrame
│ Row │ Species      │ SepalLength_mean │
│     │ Categorical… │ Float64          │
├─────┼──────────────┼──────────────────┤
│ 1   │ setosa       │ 5.006            │
│ 2   │ versicolor   │ 5.936            │
│ 3   │ virginica    │ 6.588            │

on an original DataFrame.

I write the transformation as the first argument here, but typically, you would write it as the last (as then you can pass multiple transformations), e.g.:

julia> by(iris, :Species, :SepalLength => mean, :SepalWidth => minimum)
3×3 DataFrame
│ Row │ Species      │ SepalLength_mean │ SepalWidth_minimum │
│     │ Categorical… │ Float64          │ Float64            │
├─────┼──────────────┼──────────────────┼────────────────────┤
│ 1   │ setosa       │ 5.006            │ 2.3                │
│ 2   │ versicolor   │ 5.936            │ 2.0                │
│ 3   │ virginica    │ 6.588            │ 2.2                │

Upvotes: 7

Related Questions