user10969675
user10969675

Reputation: 59

Get and graph groupby result distribution of a column

I want to graph my group's distribution of a label column. I was able to do so with creating dummies, crating pivot table of each of the groups, and then create a loop to build a new dataframe. I am looking for a shorter way. Maybe with more advance methods of groupby? And also I don't know how to create a side by side bar chart instead of the stack bar chart I have here.

To recreate the dataframe:

import pandas as pd
import numpy as np

np.random.seed(1)
a = np.random.choice(['region_A', 'region_B', 'region_C', 'region_D', 'region_E'], size=30, p= 
[0.1, 0.2, 0.3, 0.30, 0.1])
b = np.random.choice(['1', '0'], size=30, p=[0.5, 0.5])
df = pd.DataFrame({'region': a, 'label': b})

My desire graph:

dummy = pd.get_dummies(df['region'])
region_lst = []
label_0 = []
label_1 = []
for col in dummy.columns:
    region_lst.append(col)
    label_0.append(pd.crosstab(dummy[col], df['label']).iloc[1,0])
    label_1.append(pd.crosstab(dummy[col], df['label']).iloc[1,1])

df_labels = pd.DataFrame({'label_0': label_0, 'label_1': label_1}, index=region_lst)
df_labels.plot.bar()

Upvotes: 2

Views: 624

Answers (2)

jezrael
jezrael

Reputation: 862511

Use crosstab with DataFrame.add_prefix for same ouput like in your long code:

pd.crosstab(df['region'], df['label']).add_prefix('label_').plot.bar()

Details:

df_labels = pd.crosstab(df['region'], df['label']).add_prefix('label_')
print (df_labels)
label     label_0  label_1
region                    
region_A        2        3
region_B        3        3
region_C        5        4
region_D        3        6
region_E        1        0

If need remove texts label and region:

df_labels = (pd.crosstab(df['region'], df['label'])
               .add_prefix('label_')
               .rename_axis(index=None, columns=None)
print (df_labels)
          label_0  label_1
region_A        2        3
region_B        3        3
region_C        5        4
region_D        3        6
region_E        1        0

Upvotes: 2

mozway
mozway

Reputation: 260420

You can use a crosstab:

pd.crosstab(df['region'], df['label']).plot.bar()

output:

enter image description here

intermediate crosstab:

label     0  1
region        
region_A  2  3
region_B  3  3
region_C  5  4
region_D  3  6
region_E  1  0

Upvotes: 0

Related Questions