chippycentra
chippycentra

Reputation: 3432

Subset groups in a dataframe according to threshold in two columns in pandas

Hello everyone I would need help in order to subset groups in a dataframes

Here is an exemple

       Groups COL1
0      G1  DOG
1      G1  DOG
2      G1  DOG
3      G1  DOG
4      G1  DOG
5      G1  CAT
6      G1  CAT
7      G1  CAT
8      G2  DOG
9      G2  DOG
10     G2  DOG
11     G2  DOG
12     G2  DOG
13     G2  CAT
14     G2  CAT
15     G3  DOG
16     G3  DOG
17     G3  CAT
18     G4  DOG
19     G4  DOG
20     G4  DOG
21     G4  DOG
22     G4  DOG
23     G4  DOG
24     G4  CAT
25     G4  CAT
26     G5  DOG
27     G5  DOG

I would like to keep a Group :

The expected result would then be :

          Groups COL1
0      G1  DOG
1      G1  DOG
2      G1  DOG
3      G1  DOG
4      G1  DOG
5      G1  CAT
6      G1  CAT
7      G1  CAT
15     G3  DOG
16     G3  DOG
17     G3  CAT

Does someone have an idea please ?

The data :

{'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G1', 5: 'G1', 6: 'G1', 7: 'G1', 8: 'G2', 9: 'G2', 10: 'G2', 11: 'G2', 12: 'G2', 13: 'G2', 14: 'G2', 15: 'G3', 16: 'G3', 17: 'G3', 18: 'G4', 19: 'G4', 20: 'G4', 21: 'G4', 22: 'G4', 23: 'G4', 24: 'G4', 25: 'G4', 26: 'G5', 27: 'G5'}, 'COL1': {0: 'DOG', 1: 'DOG', 2: 'DOG', 3: 'DOG', 4: 'DOG', 5: 'CAT', 6: 'CAT', 7: 'CAT', 8: 'DOG', 9: 'DOG', 10: 'DOG', 11: 'DOG', 12: 'DOG', 13: 'CAT', 14: 'CAT', 15: 'DOG', 16: 'DOG', 17: 'CAT', 18: 'DOG', 19: 'DOG', 20: 'DOG', 21: 'DOG', 22: 'DOG', 23: 'DOG', 24: 'CAT', 25: 'CAT', 26: 'DOG', 27: 'DOG'}}

Upvotes: 0

Views: 35

Answers (1)

jezrael
jezrael

Reputation: 863186

Use crosstab for counts, then filter groups by index in boolean indexing and pass to Series.isin for test original Groups column:

df1 = pd.crosstab(df['Groups'], df['COL1'])

g = df1.index[((df1['DOG'] >= 5) & (df1['CAT'] >= 3)) | 
               (df1['DOG'] < 5) & (df1['CAT'] >=1 )]

df = df[df['Groups'].isin(g)]
print (df)
   Groups COL1
0      G1  DOG
1      G1  DOG
2      G1  DOG
3      G1  DOG
4      G1  DOG
5      G1  CAT
6      G1  CAT
7      G1  CAT
15     G3  DOG
16     G3  DOG
17     G3  CAT

Another solution:

df1 = pd.crosstab(df['Groups'], df['COL1'])

df2 = df1.query('(DOG >= 5 & CAT >= 3) | (DOG < 5 & CAT >=1)')

df = df[df['Groups'].isin(df2.index)]

Upvotes: 2

Related Questions