eljusticiero67
eljusticiero67

Reputation: 2482

Getting Row percentages in python to plot

Hi I am looking to create some kind of frequency table where i can get row (or column) percentages, with the purpose of being able to plot the data side by side in a bar chart. The output would look something like this:

       Male  Female
 Mon   21%     79%
 Tues  33%     67%
 Weds  11%     89%

Currently my data looks like this:

Day  Gender
Mon  Male
Mon  Female
Mon  Female
Weds Female
Fri  Male

In the end I just want 5 bar plots (one for each day) with two bars for each gender. I tried groupby:

df.groupby(['day','gender']).size()

But this only gives raw counts and both histograms are uninterpretable and bar plots will look weird since I cannot normalize (certain days have much higher volumes than other days). any help will be appreciated!

Upvotes: 1

Views: 160

Answers (1)

Alexander
Alexander

Reputation: 109526

After you do your groupby above, you then need to unstack the results (which makes a pivot table).

gb = df.groupby(['Day','Gender']).size().unstack()
>>> gb
Gender    Female      Male
Day                       
Fri          NaN  1.000000
Mon     0.666667  0.333333
Weds    1.000000       NaN

You may wish to fill NaNs with zeros (.fillna(0)).

Now you can divide this result by the sum of the rows:

>>> gb.div(gb.sum(axis=1).values, axis=0)
Gender    Female      Male
Day                       
Fri          NaN  1.000000
Mon     0.666667  0.333333
Weds    1.000000       NaN

You can plot this by appending .plot(kind='bar')

Upvotes: 2

Related Questions