hugo
hugo

Reputation: 475

Python how to count boolean in dataframe and compute percentage

I would like to do 2 manupilation with pandas but I cannot manage to do it.

I have a dataframe that looks like this :

df = pd.DataFrame({'Name': {0: 'Eric', 1: 'Mattieu',2: 'Eric',3: 'Mattieu', 4: 'Mattieu',5: 'Eric',6:'Mattieu',7:'Franck',8:'Franck',9:'Jack',10:'Jack'},
    'Value': {0: False, 1:False,2:True,3:False, 4:True,5: True,6: False,7:True,8:True,9:False,10:False},
                  })

df=df.sort_values(["Name"])
df

output:

       Name  Value
0      Eric  False
2      Eric   True
5      Eric   True
7    Franck   True
8    Franck   True
9      Jack  False
10     Jack  False
1   Mattieu  False
3   Mattieu  False
4   Mattieu   True
6   Mattieu  False

Manupilation 1 : I would like to have the Number of True , False, Total value and the mean of True Value for each Name, like this :

    Name    Nbr True    Nbr False   Total Value   Mean (True/(False+True))
0   Eric    2           1           3             0.75
1   Franck  2           0           2             1.00
2   Jack    0           2           2             0.00
3   Mattieu 1           3           4             0.25

Manupilation 2: I would like to get a group by mean of the column "Mean (True/(False+True))" grouped by "Total value", like this :

    Group by Total Value    Mean of grouped Total Value
0   2                       0.50
1   3                       0.75
2   4                       0.25

Thanks in advance for your help

Upvotes: 1

Views: 417

Answers (1)

BENY
BENY

Reputation: 323316

First one can be done with crosstab

s1 = pd.crosstab(df['Name'], df['Value'], margins=True).drop('All').assign(Mean = lambda x : x[True]/x['All'])
Out[266]: 
Value    False  True  All      Mean
Name                               
Eric         1     2    3  0.666667
Franck       0     2    2  1.000000
Jack         2     0    2  0.000000
Mattieu      3     1    4  0.250000

2nd dataframe do with groupby

s2 = s1.groupby('All').apply(lambda x : sum(x[True])/sum(x['All'])).reset_index(name='Mean of ALL')
Out[274]: 
   All  Mean of ALL
0    2     0.500000
1    3     0.666667
2    4     0.250000

Upvotes: 3

Related Questions