Reputation: 214
I'm aiming to return rows in a pandas df that contain two specific values grouped by a separate column. Using below, I'm grouping by Num
and aiming to return rows where B
is present but not A
for each unique group.
If neither A
nor B
is assigned to a grouped value then continue. I only want to return the rows where B
is present but not A
.
import pandas as pd
df = pd.DataFrame({
'Num' : [1,1,2,2,2,2,3,3,4,4,4,4],
'Label' : ['X','Y','X','B','B','B','A','B','B','A','B','X'],
})
df = df.loc[(df['Label'] == 'A') | (df['Label'] == 'B')]
df = df.groupby('Num').filter(lambda x: any(x['Label'] == 'A'))
df = df.groupby('Num').filter(lambda x: any(x['Label'] == 'B'))
intended output:
Num Label
2 2 B
3 2 B
4 2 B
5 2 B
Upvotes: 2
Views: 994
Reputation: 862731
You can filter if all values per groups are B
by GroupBy.transform
with GroupBy.all
:
df1 = df.loc[(df['Label'] == 'A') | (df['Label'] == 'B')]
df1 = df1[(df1['Label'] == 'B').groupby(df1['Num']).transform('all')]
print (df1)
Num Label
3 2 B
4 2 B
5 2 B
If need fitler original column Num
use:
df = df[df['Num'].isin(df1['Num'])]
print (df)
Num Label
2 2 X
3 2 B
4 2 B
5 2 B
Another approach is filter by numpy.setdiff1d
:
num = np.setdiff1d(df.loc[(df['Label'] == 'B'), 'Num'],
df.loc[(df['Label'] == 'A'), 'Num'])
df = df[df['Num'].isin(num)]
print (df)
Num Label
2 2 X
3 2 B
4 2 B
5 2 B
Upvotes: 2