AVA
AVA

Reputation: 2558

How to draw line chart for male and female of each county?

Dataframe, as follows:

julia> df6
135×4 DataFrame
│ Row │ County      │ Year  │ Female │ Male   │
│     │ String      │ Int64 │ Int64⍰ │ Int64⍰ │
├─────┼─────────────┼───────┼────────┼────────┤
│ 1   │ Asotin      │ 2008  │ 1      │ 0      │
│ 2   │ Asotin      │ 2009  │ 0      │ 0      │
│ 3   │ Asotin      │ 2010  │ 0      │ 0      │
│ 4   │ Asotin      │ 2011  │ 0      │ 0      │
│ 5   │ Asotin      │ 2012  │ 0      │ 0      │
│ 6   │ Benton      │ 2008  │ 1      │ 0      │
│ 7   │ Benton      │ 2009  │ 0      │ 0      │
│ 8   │ Benton      │ 2010  │ 0      │ 0      │
│ 9   │ Benton      │ 2011  │ 0      │ 0      │
│ 10  │ Benton      │ 2012  │ 0      │ 0      │
│ 11  │ Chelan      │ 2008  │ 1      │ 0      │
│ 12  │ Chelan      │ 2009  │ 1      │ 0      │
│ 13  │ Chelan      │ 2010  │ 0      │ 1      │
│ 14  │ Chelan      │ 2011  │ 0      │ 0      │
│ 15  │ Chelan      │ 2012  │ 0      │ 2      │
│ 16  │ Clallam     │ 2008  │ 0      │ 0      │
│ 17  │ Clallam     │ 2009  │ 0      │ 0      │
│ 18  │ Clallam     │ 2010  │ 0      │ 0      │
│ 19  │ Clallam     │ 2011  │ 1      │ 1      │
│ 20  │ Clallam     │ 2012  │ 0      │ 0      │
│ 21  │ Clark       │ 2008  │ 0      │ 1      │
⋮
│ 114 │ Thurston    │ 2011  │ 0      │ 0      │
│ 115 │ Thurston    │ 2012  │ 0      │ 0      │
│ 116 │ Walla Walla │ 2008  │ 0      │ 0      │
│ 117 │ Walla Walla │ 2009  │ 0      │ 1      │
│ 118 │ Walla Walla │ 2010  │ 0      │ 0      │
│ 119 │ Walla Walla │ 2011  │ 0      │ 0      │
│ 120 │ Walla Walla │ 2012  │ 0      │ 0      │
│ 121 │ Whatcom     │ 2008  │ 0      │ 0      │
│ 122 │ Whatcom     │ 2009  │ 1      │ 0      │
│ 123 │ Whatcom     │ 2010  │ 0      │ 1      │
│ 124 │ Whatcom     │ 2011  │ 1      │ 1      │
│ 125 │ Whatcom     │ 2012  │ 0      │ 1      │
│ 126 │ Whitman     │ 2008  │ 0      │ 0      │
│ 127 │ Whitman     │ 2009  │ 0      │ 0      │
│ 128 │ Whitman     │ 2010  │ 0      │ 1      │
│ 129 │ Whitman     │ 2011  │ 0      │ 0      │
│ 130 │ Whitman     │ 2012  │ 0      │ 0      │
│ 131 │ Yakima      │ 2008  │ 0      │ 0      │
│ 132 │ Yakima      │ 2009  │ 0      │ 1      │
│ 133 │ Yakima      │ 2010  │ 1      │ 2      │
│ 134 │ Yakima      │ 2011  │ 0      │ 3      │
│ 135 │ Yakima      │ 2012  │ 0      │ 1      │

The following code draws line chart for male and female of King County:

line_chart = @pipe df6|>
                   filter(row -> row[:County] == ("King"), _) |>
                   plot(_.Year, [_.Male, _.Female],
                        title = "King County Youth Suicides",
                        label=["Male" "Female"],
                        xlabel="Year",
                        ylabel="Suicides",
                        size=(700,700)
                        )

Grouped Data:

julia> grouped_data = @pipe df6|>
                          groupby(_, :County)
GroupedDataFrame with 27 groups based on key: County
First Group (5 rows): County = "Asotin"
│ Row │ County │ Year  │ Female │ Male   │
│     │ String │ Int64 │ Int64⍰ │ Int64⍰ │
├─────┼────────┼───────┼────────┼────────┤
│ 1   │ Asotin │ 2008  │ 1      │ 0      │
│ 2   │ Asotin │ 2009  │ 0      │ 0      │
│ 3   │ Asotin │ 2010  │ 0      │ 0      │
│ 4   │ Asotin │ 2011  │ 0      │ 0      │
│ 5   │ Asotin │ 2012  │ 0      │ 0      │
⋮
Last Group (5 rows): County = "Yakima"
│ Row │ County │ Year  │ Female │ Male   │
│     │ String │ Int64 │ Int64⍰ │ Int64⍰ │
├─────┼────────┼───────┼────────┼────────┤
│ 1   │ Yakima │ 2008  │ 0      │ 0      │
│ 2   │ Yakima │ 2009  │ 0      │ 1      │
│ 3   │ Yakima │ 2010  │ 1      │ 2      │
│ 4   │ Yakima │ 2011  │ 0      │ 3      │
│ 5   │ Yakima │ 2012  │ 0      │ 1      │

While trying to draw line chart for male suicides and females suicides of each County, throws the following error:

julia> line_chart = @pipe df6|>
                          groupby(_, :County) |>
                          plot(_.Year, [_.Male, _.Female],
                               title = "King County Youth Suicides",
                               label=["Male" "Female"],
                               xlabel="Year",
                               ylabel="Suicides",
                               size=(700,700)
                               )
ERROR: type GroupedDataFrame has no field Year
Stacktrace:
 [1] getproperty(::GroupedDataFrame{DataFrame}, ::Symbol) at ./Base.jl:33
 [2] top-level scope at REPL[27]:1

Update 1:

The following code draws line charts for male and female suicides of each county:

counties = unique(df6.County)

line_chart=plot(lw =3, 
                title = "Youth Suicides",
                xlabel="Year",
                ylabel="Suicides",
                size=(1200,1000)
)
for county in counties
                   @pipe df6|>
                   filter(row -> row[:County] == county, _) |>
                   plot!(line_chart, _.Year, [_.Male, _.Female],
                        label=["$county Male" "$county Female"],
                        )
println("County : $county")
end
savefig(line_chart, "line_chart.pdf")

Please guide me in drawing line charts correctly and efficiently for male and female suicides of each County?

Upvotes: 2

Views: 110

Answers (1)

hdavid16
hdavid16

Reputation: 126

Here's a solution that does not use piping (it could be piped too).

Group and add Male and Female in each group:

gdf6 = combine(groupby(df6, [:County, :Year]), :Male => sum, :Female => sum)

Use StatsPlots.jl:

@df gdf plot(:Year, :Male_sum, group={County_Male=:County})

@df gdf plot!(:Year, :Female_sum, group={County_Female=:County})

Another option would be to stack the dataframe before plotting and then you can use a single plot command:

sdf6 = stack(gdf6, [:Male_sum, :Female_sum])

@df sdf6 plot(:Year, :value, group=(county=:County, sex=:variable))

If you want a bar plot then change plot with groupedbar, which might be better visually.

Upvotes: 2

Related Questions