Reputation: 1037
I've a dataframe with 2 columns. I'm trying to calculate the % of TypeB on number of records in ID as follows:
Formula: (Count of TypeB) / (No of records in Group) * 100
Result :
001 = (2/3) * 100 => 66.66
002 = (0/2) * 100 => 0
003 = (1/1) * 100 => 100
Dataframe
ID Type
001 TypeA
001 TypeB
001 TypeB
002 TypeA
002 TypeA
003 TypeB
So, far I've been able to groupBy
byID = df.groupby('ID')
I was reading panda's crosstab, but can't seem to figure out how to approach the solution
Upvotes: 3
Views: 1745
Reputation: 862661
You can use groupby
with size
for count length of groups first, reshape by unstack
with filling NaN
by 0
and then divide column TypeB
by sum
, last multiple by 100
:
df = df.groupby(['ID','Type']).size().unstack(fill_value=0)
print (df)
Type TypeA TypeB
ID
1 1 2
2 2 0
3 0 1
df1 = df.TypeB.div(df.sum(axis=1)).mul(100).reset_index(name='percentage')
print (df1)
ID percentage
0 1 66.666667
1 2 0.000000
2 3 100.000000
For reshape is possible use crosstab
, but it is a bit slowier in larger dataframe:
df = pd.crosstab(df.ID,df.Type)
print (df)
Type TypeA TypeB
ID
1 1 2
2 2 0
3 0 1
EDIT:
You can use map
for adding new column:
df1 = df.groupby(['ID','Type']).size().unstack(fill_value=0)
print (df1)
Type TypeA TypeB
ID
1 1 2
2 2 0
3 0 1
df2 = df1.TypeB.div(df1.sum(axis=1)).mul(100)
print (df2)
ID
1 66.666667
2 0.000000
3 100.000000
dtype: float64
df['percentage'] = df.ID.map(df2)
print (df)
ID Type percentage
0 1 TypeA 66.666667
1 1 TypeB 66.666667
2 1 TypeB 66.666667
3 2 TypeA 0.000000
4 2 TypeA 0.000000
5 3 TypeB 100.000000
Upvotes: 2
Reputation: 294258
Use groupby
with value_counts(normalize=True)
normalize
will automatically divide counts by totals.
df.groupby('ID').Type.value_counts(normalize=True).unstack(fill_value=0).TypeB
ID
001 0.666667
002 0.000000
003 1.000000
Name: TypeB, dtype: float64
Upvotes: 1