Reputation: 35
I have performed calculations on subsets of a DataFrame by using the groupby
function:
using RDatasets
iris = dataset("datasets", "iris")
describe(iris)
iris_grouped = groupby(iris,:Species)
iris_avg = map(:SepalLength => mean,iris_grouped::GroupedDataFrame)
Now I would like to plot the results, but I get an error message for the following plot:
@df iris_avg bar(:Species,:SepalLength)
Only tables are supported
What would be the best way to plot the data? My idea would be to create a single DataFrame and go from there. How would I do this, ie how do I convert a GroupedDataFrame to a single DataFrame? Thanks!
Upvotes: 2
Views: 668
Reputation: 1040
I think you might be better off using the by
function to get to your iris_avg
directly. by
iterates through a DataFrame
, and then applies the given function to the the results. Often, it's used with a do
block.
julia> by(iris, :Species) do df
DataFrame(sepal_mean = mean(df.SepalLength))
end
3×2 DataFrame
│ Row │ Species │ sepal_mean │
│ │ Categorical… │ Float64 │
├─────┼──────────────┼────────────┤
│ 1 │ setosa │ 5.006 │
│ 2 │ versicolor │ 5.936 │
│ 3 │ virginica │ 6.588 │
Or equivalently,
julia> by(iris, :Species, SepalLength_mean = :SepalLength => mean)
3×2 DataFrame
│ Row │ Species │ SepalLength_mean │
│ │ Categorical… │ Float64 │
├─────┼──────────────┼──────────────────┤
│ 1 │ setosa │ 5.006 │
│ 2 │ versicolor │ 5.936 │
│ 3 │ virginica │ 6.588 │
See here for more details/examples.
Alternatively, you can do it in several steps as you've done, then use DataFrame
constructor to convert to a proper DataFrame
:
julia> iris_grouped = groupby(iris,:Species);
julia> iris_avg = map(:SepalLength => mean,iris_grouped::GroupedDataFrame);
julia> DataFrame(iris_avg)
3×2 DataFrame
│ Row │ Species │ SepalLength_mean │
│ │ Categorical… │ Float64 │
├─────┼──────────────┼──────────────────┤
│ 1 │ setosa │ 5.006 │
│ 2 │ versicolor │ 5.936 │
│ 3 │ virginica │ 6.588 │
Upvotes: 4
Reputation: 69949
To convert GroupedDataFrame
into a DataFrame
just call DataFrame
on it, e.g.:
julia> DataFrame(iris_avg)
3×2 DataFrame
│ Row │ Species │ SepalLength_mean │
│ │ Categorical… │ Float64 │
├─────┼──────────────┼──────────────────┤
│ 1 │ setosa │ 5.006 │
│ 2 │ versicolor │ 5.936 │
│ 3 │ virginica │ 6.588 │
in your case.
You could also have written:
julia> combine(:SepalLength => mean, iris_grouped)
3×2 DataFrame
│ Row │ Species │ SepalLength_mean │
│ │ Categorical… │ Float64 │
├─────┼──────────────┼──────────────────┤
│ 1 │ setosa │ 5.006 │
│ 2 │ versicolor │ 5.936 │
│ 3 │ virginica │ 6.588 │
on an original GroupedDataFrame
or
julia> by(:SepalLength => mean, iris, :Species)
3×2 DataFrame
│ Row │ Species │ SepalLength_mean │
│ │ Categorical… │ Float64 │
├─────┼──────────────┼──────────────────┤
│ 1 │ setosa │ 5.006 │
│ 2 │ versicolor │ 5.936 │
│ 3 │ virginica │ 6.588 │
on an original DataFrame
.
I write the transformation as the first argument here, but typically, you would write it as the last (as then you can pass multiple transformations), e.g.:
julia> by(iris, :Species, :SepalLength => mean, :SepalWidth => minimum)
3×3 DataFrame
│ Row │ Species │ SepalLength_mean │ SepalWidth_minimum │
│ │ Categorical… │ Float64 │ Float64 │
├─────┼──────────────┼──────────────────┼────────────────────┤
│ 1 │ setosa │ 5.006 │ 2.3 │
│ 2 │ versicolor │ 5.936 │ 2.0 │
│ 3 │ virginica │ 6.588 │ 2.2 │
Upvotes: 7