Reputation: 601
i have a df as shown below
a c d
0 ABC 0.4 y
1 ABC 0.3 x
2 DEF 0.3 x
3 DEF 0.2 x
4 DEF 0.5 x
5 DEF 0.4 y
i would like to sort df by column 'c', then groupby column 'a' and then drop ALL rows of the group if the value of column 'd'= 'y' for the last row of the group
my expected output is
a c d
2 DEF 0.2 x
3 DEF 0.3 x
4 DEF 0.4 y
5 DEF 0.5 x
so group 'ABC' got deleted as after sorting by col 'c' as last row in group d = y but group 'DEF' stays as last row in DEF col d = x
Upvotes: 0
Views: 341
Reputation: 323316
Let us do filter
df=df.groupby('a').filter(lambda x : x.at[x['c'].idxmax(),'d']!='y')
Out[278]:
a c d
2 DEF 0.3 x
3 DEF 0.2 x
4 DEF 0.5 x
5 DEF 0.4 y
Upvotes: 1
Reputation: 150785
Straight from your logic:
mask = (df.sort_values('c') # sort the values by `c`
.groupby('a')['d'] # groupby `a` and look at `d`
.transform('last') # select the last rows
.ne('y') # check if last rows are `y`
.reindex(df.index) # reindex as the original data
)
df = df[mask]
Output:
a c d
2 DEF 0.3 x
3 DEF 0.2 x
4 DEF 0.5 x
5 DEF 0.4 y
Upvotes: 2