Chopin
Chopin

Reputation: 214

Groupby contains two specific values - pandas

I'm aiming to return rows in a pandas df that contain two specific values grouped by a separate column. Using below, I'm grouping by Num and aiming to return rows where B is present but not A for each unique group.

If neither A nor B is assigned to a grouped value then continue. I only want to return the rows where B is present but not A.

import pandas as pd

df = pd.DataFrame({      
    'Num' : [1,1,2,2,2,2,3,3,4,4,4,4],
    'Label' : ['X','Y','X','B','B','B','A','B','B','A','B','X'],          
   })

df = df.loc[(df['Label'] == 'A') | (df['Label'] == 'B')]
df = df.groupby('Num').filter(lambda x: any(x['Label'] == 'A'))
df = df.groupby('Num').filter(lambda x: any(x['Label'] == 'B'))

intended output:

    Num Label  
2     2     B      
3     2     B  
4     2     B  
5     2     B 

Upvotes: 2

Views: 994

Answers (1)

jezrael
jezrael

Reputation: 862731

You can filter if all values per groups are B by GroupBy.transform with GroupBy.all:

df1 = df.loc[(df['Label'] == 'A') | (df['Label'] == 'B')]
df1 = df1[(df1['Label'] == 'B').groupby(df1['Num']).transform('all')]
print (df1)
   Num Label
3    2     B
4    2     B
5    2     B

If need fitler original column Num use:

df = df[df['Num'].isin(df1['Num'])]
print (df)
   Num Label
2    2     X
3    2     B
4    2     B
5    2     B

Another approach is filter by numpy.setdiff1d:

num = np.setdiff1d(df.loc[(df['Label'] == 'B'), 'Num'],
                   df.loc[(df['Label'] == 'A'), 'Num'])

df = df[df['Num'].isin(num)]
print (df)
   Num Label
2    2     X
3    2     B
4    2     B
5    2     B

Upvotes: 2

Related Questions