Lili
Lili

Reputation: 371

Calculate percentage of categorical column using conditional groupby and count in Python

I want to calculate the percentage, for each id, of True values from all the rows of the id.

Here an example of my data:

id     col1    
 1     True
 1     True
 1     False
 1     True
 2     False
 2     False

The new column should look like this:

id     col1    num_true
 1     True     0.75
 1     True     0.75
 1     False    0.75
 1     True     0.75
 2     False    0
 2     False    0

This is what I tried to do:

df['num_true']= df[df['col1'] == 'True'].groupby('id')['col1'].count()
df['num_col1_id']= df.groupby('id')['col1'].transform('count')

df['perc_true']= df.num_true/df.num_col1_id

Upvotes: 2

Views: 1016

Answers (2)

wwnde
wwnde

Reputation: 26676

groupby and apply transform to get the mean

df['num_true']=df.groupby('id').col1.transform('mean')



  id   col1  num_true
0   1   True      0.75
1   1   True      0.75
2   1  False      0.75
3   1   True      0.75
4   2  False      0.00
5   2  False      0.00

Upvotes: 5

m.c
m.c

Reputation: 48

Here is the asked code:

import pandas as pd
df = pd.DataFrame({"col1": [True,True,False,True,False,False]}, index = [1,1,1,1,2,2])
grouped_df = df.groupby(df.index)
df["num_true"] = grouped_df.sum() / grouped_df.count()

What I did here is to group the dataframe by the index, After that, I sum the number of "True" values and divide it by the total number of values.

Result:

    col1    num_true
1   True    0.75
1   True    0.75
1   False   0.75
1   True    0.75
2   False   0.00
2   False   0.00

Upvotes: 2

Related Questions