Reputation: 2558
Dataframe, as follows:
julia> df6
135×4 DataFrame
│ Row │ County │ Year │ Female │ Male │
│ │ String │ Int64 │ Int64⍰ │ Int64⍰ │
├─────┼─────────────┼───────┼────────┼────────┤
│ 1 │ Asotin │ 2008 │ 1 │ 0 │
│ 2 │ Asotin │ 2009 │ 0 │ 0 │
│ 3 │ Asotin │ 2010 │ 0 │ 0 │
│ 4 │ Asotin │ 2011 │ 0 │ 0 │
│ 5 │ Asotin │ 2012 │ 0 │ 0 │
│ 6 │ Benton │ 2008 │ 1 │ 0 │
│ 7 │ Benton │ 2009 │ 0 │ 0 │
│ 8 │ Benton │ 2010 │ 0 │ 0 │
│ 9 │ Benton │ 2011 │ 0 │ 0 │
│ 10 │ Benton │ 2012 │ 0 │ 0 │
│ 11 │ Chelan │ 2008 │ 1 │ 0 │
│ 12 │ Chelan │ 2009 │ 1 │ 0 │
│ 13 │ Chelan │ 2010 │ 0 │ 1 │
│ 14 │ Chelan │ 2011 │ 0 │ 0 │
│ 15 │ Chelan │ 2012 │ 0 │ 2 │
│ 16 │ Clallam │ 2008 │ 0 │ 0 │
│ 17 │ Clallam │ 2009 │ 0 │ 0 │
│ 18 │ Clallam │ 2010 │ 0 │ 0 │
│ 19 │ Clallam │ 2011 │ 1 │ 1 │
│ 20 │ Clallam │ 2012 │ 0 │ 0 │
│ 21 │ Clark │ 2008 │ 0 │ 1 │
⋮
│ 114 │ Thurston │ 2011 │ 0 │ 0 │
│ 115 │ Thurston │ 2012 │ 0 │ 0 │
│ 116 │ Walla Walla │ 2008 │ 0 │ 0 │
│ 117 │ Walla Walla │ 2009 │ 0 │ 1 │
│ 118 │ Walla Walla │ 2010 │ 0 │ 0 │
│ 119 │ Walla Walla │ 2011 │ 0 │ 0 │
│ 120 │ Walla Walla │ 2012 │ 0 │ 0 │
│ 121 │ Whatcom │ 2008 │ 0 │ 0 │
│ 122 │ Whatcom │ 2009 │ 1 │ 0 │
│ 123 │ Whatcom │ 2010 │ 0 │ 1 │
│ 124 │ Whatcom │ 2011 │ 1 │ 1 │
│ 125 │ Whatcom │ 2012 │ 0 │ 1 │
│ 126 │ Whitman │ 2008 │ 0 │ 0 │
│ 127 │ Whitman │ 2009 │ 0 │ 0 │
│ 128 │ Whitman │ 2010 │ 0 │ 1 │
│ 129 │ Whitman │ 2011 │ 0 │ 0 │
│ 130 │ Whitman │ 2012 │ 0 │ 0 │
│ 131 │ Yakima │ 2008 │ 0 │ 0 │
│ 132 │ Yakima │ 2009 │ 0 │ 1 │
│ 133 │ Yakima │ 2010 │ 1 │ 2 │
│ 134 │ Yakima │ 2011 │ 0 │ 3 │
│ 135 │ Yakima │ 2012 │ 0 │ 1 │
The following code draws line chart for male and female of King County:
line_chart = @pipe df6|>
filter(row -> row[:County] == ("King"), _) |>
plot(_.Year, [_.Male, _.Female],
title = "King County Youth Suicides",
label=["Male" "Female"],
xlabel="Year",
ylabel="Suicides",
size=(700,700)
)
Grouped Data:
julia> grouped_data = @pipe df6|>
groupby(_, :County)
GroupedDataFrame with 27 groups based on key: County
First Group (5 rows): County = "Asotin"
│ Row │ County │ Year │ Female │ Male │
│ │ String │ Int64 │ Int64⍰ │ Int64⍰ │
├─────┼────────┼───────┼────────┼────────┤
│ 1 │ Asotin │ 2008 │ 1 │ 0 │
│ 2 │ Asotin │ 2009 │ 0 │ 0 │
│ 3 │ Asotin │ 2010 │ 0 │ 0 │
│ 4 │ Asotin │ 2011 │ 0 │ 0 │
│ 5 │ Asotin │ 2012 │ 0 │ 0 │
⋮
Last Group (5 rows): County = "Yakima"
│ Row │ County │ Year │ Female │ Male │
│ │ String │ Int64 │ Int64⍰ │ Int64⍰ │
├─────┼────────┼───────┼────────┼────────┤
│ 1 │ Yakima │ 2008 │ 0 │ 0 │
│ 2 │ Yakima │ 2009 │ 0 │ 1 │
│ 3 │ Yakima │ 2010 │ 1 │ 2 │
│ 4 │ Yakima │ 2011 │ 0 │ 3 │
│ 5 │ Yakima │ 2012 │ 0 │ 1 │
While trying to draw line chart for male suicides and females suicides of each County, throws the following error:
julia> line_chart = @pipe df6|>
groupby(_, :County) |>
plot(_.Year, [_.Male, _.Female],
title = "King County Youth Suicides",
label=["Male" "Female"],
xlabel="Year",
ylabel="Suicides",
size=(700,700)
)
ERROR: type GroupedDataFrame has no field Year
Stacktrace:
[1] getproperty(::GroupedDataFrame{DataFrame}, ::Symbol) at ./Base.jl:33
[2] top-level scope at REPL[27]:1
Update 1:
The following code draws line charts for male and female suicides of each county:
counties = unique(df6.County)
line_chart=plot(lw =3,
title = "Youth Suicides",
xlabel="Year",
ylabel="Suicides",
size=(1200,1000)
)
for county in counties
@pipe df6|>
filter(row -> row[:County] == county, _) |>
plot!(line_chart, _.Year, [_.Male, _.Female],
label=["$county Male" "$county Female"],
)
println("County : $county")
end
savefig(line_chart, "line_chart.pdf")
Please guide me in drawing line charts correctly and efficiently for male and female suicides of each County?
Upvotes: 2
Views: 110
Reputation: 126
Here's a solution that does not use piping (it could be piped too).
Group and add Male and Female in each group:
gdf6 = combine(groupby(df6, [:County, :Year]), :Male => sum, :Female => sum)
Use StatsPlots.jl:
@df gdf plot(:Year, :Male_sum, group={County_Male=:County})
@df gdf plot!(:Year, :Female_sum, group={County_Female=:County})
Another option would be to stack the dataframe before plotting and then you can use a single plot command:
sdf6 = stack(gdf6, [:Male_sum, :Female_sum])
@df sdf6 plot(:Year, :value, group=(county=:County, sex=:variable))
If you want a bar plot then change plot
with groupedbar
, which might be better visually.
Upvotes: 2