Reputation: 2482
Hi I am looking to create some kind of frequency table where i can get row (or column) percentages, with the purpose of being able to plot the data side by side in a bar chart. The output would look something like this:
Male Female
Mon 21% 79%
Tues 33% 67%
Weds 11% 89%
Currently my data looks like this:
Day Gender
Mon Male
Mon Female
Mon Female
Weds Female
Fri Male
In the end I just want 5 bar plots (one for each day) with two bars for each gender. I tried groupby:
df.groupby(['day','gender']).size()
But this only gives raw counts and both histograms are uninterpretable and bar plots will look weird since I cannot normalize (certain days have much higher volumes than other days). any help will be appreciated!
Upvotes: 1
Views: 160
Reputation: 109526
After you do your groupby above, you then need to unstack the results (which makes a pivot table).
gb = df.groupby(['Day','Gender']).size().unstack()
>>> gb
Gender Female Male
Day
Fri NaN 1.000000
Mon 0.666667 0.333333
Weds 1.000000 NaN
You may wish to fill NaNs with zeros (.fillna(0)
).
Now you can divide this result by the sum of the rows:
>>> gb.div(gb.sum(axis=1).values, axis=0)
Gender Female Male
Day
Fri NaN 1.000000
Mon 0.666667 0.333333
Weds 1.000000 NaN
You can plot this by appending .plot(kind='bar')
Upvotes: 2