Calculate percentage of element in column within groups in pandas

Question

Hello everyone I woule need help in order to count the percentage of values within columns and groups. Here is an exempel dataframe

       Groups COL1
0      G1  DOG
1      G1  DOG
2      G1  DOG
3      G1  CAT
4      G2  CAT
5      G2  DOG
6      G3  DOG
7      G3  CAT
8      G3  CAT
9      G3  CAT
10     G3  CAT
11     G4  DOG
12     G4  DOG

and I would like to only keep Groups where the percentage of DOG >= 50%

so here for instance :

G1 = 3/4 = 75% I keep this group
G2 = 1/2 = 50% I remove this group
G3 = 1/5 = 20% I remove this group 
G4 = 2/2 = 100% I keep this groups

Then I should get

   Groups COL1
0      G1  DOG
1      G1  DOG
2      G1  DOG
3      G1  CAT
11     G4  DOG
12     G4  DOG

does someone have an idea please ?

here is the dataframe in dict format if it can helps:

{'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G2', 5: 'G2', 6: 'G3', 7: 'G3', 8: 'G3', 9: 'G3', 10: 'G3', 11: 'G4', 12: 'G4'}, 'COL1': {0: 'DOG', 1: 'DOG', 2: 'DOG', 3: 'CAT', 4: 'CAT', 5: 'DOG', 6: 'DOG', 7: 'CAT', 8: 'CAT', 9: 'CAT', 10: 'CAT', 11: 'DOG', 12: 'DOG'}}

jezrael · Accepted Answer

Use GroupBy.transform with mean of boolean mask for compared DOG and filter by Series.gt for greater in boolean indexing:

df = df[df['COL1'].eq('DOG').groupby(df['Groups']).transform('mean').gt(0.5)]
print (df)
   Groups COL1
0      G1  DOG
1      G1  DOG
2      G1  DOG
3      G1  CAT
11     G4  DOG
12     G4  DOG

Calculate percentage of element in column within groups in pandas

Answers (1)

Related Questions