user6083088
user6083088

Reputation: 1037

Python Pandas GroupBy % calculation

I've a dataframe with 2 columns. I'm trying to calculate the % of TypeB on number of records in ID as follows:

Formula: (Count of TypeB) / (No of records in Group) * 100

Result :
001 = (2/3) * 100 => 66.66
002 = (0/2) * 100 => 0
003 = (1/1) * 100 => 100

Dataframe

ID   Type
001  TypeA
001  TypeB
001  TypeB
002  TypeA
002  TypeA
003  TypeB

So, far I've been able to groupBy

byID = df.groupby('ID')

I was reading panda's crosstab, but can't seem to figure out how to approach the solution

Upvotes: 3

Views: 1745

Answers (2)

jezrael
jezrael

Reputation: 862661

You can use groupby with size for count length of groups first, reshape by unstack with filling NaN by 0 and then divide column TypeB by sum, last multiple by 100:

df = df.groupby(['ID','Type']).size().unstack(fill_value=0)
print (df)
Type  TypeA  TypeB
ID                
1         1      2
2         2      0
3         0      1

df1 = df.TypeB.div(df.sum(axis=1)).mul(100).reset_index(name='percentage')
print (df1)
   ID  percentage
0   1   66.666667
1   2    0.000000
2   3  100.000000

For reshape is possible use crosstab, but it is a bit slowier in larger dataframe:

df = pd.crosstab(df.ID,df.Type)
print (df)
Type  TypeA  TypeB
ID                
1         1      2
2         2      0
3         0      1

EDIT:

You can use map for adding new column:

df1 = df.groupby(['ID','Type']).size().unstack(fill_value=0)
print (df1)
Type  TypeA  TypeB
ID                
1         1      2
2         2      0
3         0      1

df2 = df1.TypeB.div(df1.sum(axis=1)).mul(100)
print (df2)
ID
1     66.666667
2      0.000000
3    100.000000
dtype: float64

df['percentage'] = df.ID.map(df2)
print (df)
   ID   Type  percentage
0   1  TypeA   66.666667
1   1  TypeB   66.666667
2   1  TypeB   66.666667
3   2  TypeA    0.000000
4   2  TypeA    0.000000
5   3  TypeB  100.000000

Upvotes: 2

piRSquared
piRSquared

Reputation: 294258

Use groupby with value_counts(normalize=True)
normalize will automatically divide counts by totals.

df.groupby('ID').Type.value_counts(normalize=True).unstack(fill_value=0).TypeB

ID
001    0.666667
002    0.000000
003    1.000000
Name: TypeB, dtype: float64

Upvotes: 1

Related Questions