Reputation: 3432
Hello everyone I would need help in order to subset groups in a dataframes
Here is an exemple
Groups COL1
0 G1 DOG
1 G1 DOG
2 G1 DOG
3 G1 DOG
4 G1 DOG
5 G1 CAT
6 G1 CAT
7 G1 CAT
8 G2 DOG
9 G2 DOG
10 G2 DOG
11 G2 DOG
12 G2 DOG
13 G2 CAT
14 G2 CAT
15 G3 DOG
16 G3 DOG
17 G3 CAT
18 G4 DOG
19 G4 DOG
20 G4 DOG
21 G4 DOG
22 G4 DOG
23 G4 DOG
24 G4 CAT
25 G4 CAT
26 G5 DOG
27 G5 DOG
I would like to keep a Group
:
>= 5 DOG
& >=3 CAT
< 5 DOG
& >=1 CAT
The expected result would then be :
Groups COL1
0 G1 DOG
1 G1 DOG
2 G1 DOG
3 G1 DOG
4 G1 DOG
5 G1 CAT
6 G1 CAT
7 G1 CAT
15 G3 DOG
16 G3 DOG
17 G3 CAT
Does someone have an idea please ?
The data :
{'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G1', 5: 'G1', 6: 'G1', 7: 'G1', 8: 'G2', 9: 'G2', 10: 'G2', 11: 'G2', 12: 'G2', 13: 'G2', 14: 'G2', 15: 'G3', 16: 'G3', 17: 'G3', 18: 'G4', 19: 'G4', 20: 'G4', 21: 'G4', 22: 'G4', 23: 'G4', 24: 'G4', 25: 'G4', 26: 'G5', 27: 'G5'}, 'COL1': {0: 'DOG', 1: 'DOG', 2: 'DOG', 3: 'DOG', 4: 'DOG', 5: 'CAT', 6: 'CAT', 7: 'CAT', 8: 'DOG', 9: 'DOG', 10: 'DOG', 11: 'DOG', 12: 'DOG', 13: 'CAT', 14: 'CAT', 15: 'DOG', 16: 'DOG', 17: 'CAT', 18: 'DOG', 19: 'DOG', 20: 'DOG', 21: 'DOG', 22: 'DOG', 23: 'DOG', 24: 'CAT', 25: 'CAT', 26: 'DOG', 27: 'DOG'}}
Upvotes: 0
Views: 35
Reputation: 863186
Use crosstab
for counts, then filter groups
by index
in boolean indexing
and pass to Series.isin
for test original Groups
column:
df1 = pd.crosstab(df['Groups'], df['COL1'])
g = df1.index[((df1['DOG'] >= 5) & (df1['CAT'] >= 3)) |
(df1['DOG'] < 5) & (df1['CAT'] >=1 )]
df = df[df['Groups'].isin(g)]
print (df)
Groups COL1
0 G1 DOG
1 G1 DOG
2 G1 DOG
3 G1 DOG
4 G1 DOG
5 G1 CAT
6 G1 CAT
7 G1 CAT
15 G3 DOG
16 G3 DOG
17 G3 CAT
Another solution:
df1 = pd.crosstab(df['Groups'], df['COL1'])
df2 = df1.query('(DOG >= 5 & CAT >= 3) | (DOG < 5 & CAT >=1)')
df = df[df['Groups'].isin(df2.index)]
Upvotes: 2