Reputation: 3432
Hello everyone I woule need help in order to count the percentage of values within columns and groups. Here is an exempel dataframe
Groups COL1
0 G1 DOG
1 G1 DOG
2 G1 DOG
3 G1 CAT
4 G2 CAT
5 G2 DOG
6 G3 DOG
7 G3 CAT
8 G3 CAT
9 G3 CAT
10 G3 CAT
11 G4 DOG
12 G4 DOG
and I would like to only keep Groups where the percentage of DOG
>= 50%
so here for instance :
G1 = 3/4 = 75% I keep this group
G2 = 1/2 = 50% I remove this group
G3 = 1/5 = 20% I remove this group
G4 = 2/2 = 100% I keep this groups
Then I should get
Groups COL1
0 G1 DOG
1 G1 DOG
2 G1 DOG
3 G1 CAT
11 G4 DOG
12 G4 DOG
does someone have an idea please ?
here is the dataframe in dict format if it can helps:
{'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G2', 5: 'G2', 6: 'G3', 7: 'G3', 8: 'G3', 9: 'G3', 10: 'G3', 11: 'G4', 12: 'G4'}, 'COL1': {0: 'DOG', 1: 'DOG', 2: 'DOG', 3: 'CAT', 4: 'CAT', 5: 'DOG', 6: 'DOG', 7: 'CAT', 8: 'CAT', 9: 'CAT', 10: 'CAT', 11: 'DOG', 12: 'DOG'}}
Upvotes: 1
Views: 72
Reputation: 862671
Use GroupBy.transform
with mean
of boolean mask for compared DOG
and filter by Series.gt
for greater in boolean indexing
:
df = df[df['COL1'].eq('DOG').groupby(df['Groups']).transform('mean').gt(0.5)]
print (df)
Groups COL1
0 G1 DOG
1 G1 DOG
2 G1 DOG
3 G1 CAT
11 G4 DOG
12 G4 DOG
Upvotes: 1