idt_tt
idt_tt

Reputation: 601

pandas: how to drop all rows of a group if the last row of the group has certain column value

i have a df as shown below

    a    c    d
0  ABC   0.4  y
1  ABC   0.3  x
2  DEF   0.3  x
3  DEF   0.2  x
4  DEF   0.5  x
5  DEF   0.4  y

i would like to sort df by column 'c', then groupby column 'a' and then drop ALL rows of the group if the value of column 'd'= 'y' for the last row of the group

my expected output is

    a    c    d
2  DEF   0.2  x
3  DEF   0.3  x
4  DEF   0.4  y
5  DEF   0.5  x

so group 'ABC' got deleted as after sorting by col 'c' as last row in group d = y but group 'DEF' stays as last row in DEF col d = x

Upvotes: 0

Views: 341

Answers (2)

BENY
BENY

Reputation: 323316

Let us do filter

df=df.groupby('a').filter(lambda x : x.at[x['c'].idxmax(),'d']!='y')
Out[278]: 
     a    c  d
2  DEF  0.3  x
3  DEF  0.2  x
4  DEF  0.5  x
5  DEF  0.4  y

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150785

Straight from your logic:

mask = (df.sort_values('c')     # sort the values by `c`
          .groupby('a')['d']    # groupby `a` and look at `d`
          .transform('last')    # select the last rows
          .ne('y')              # check if last rows are `y`
          .reindex(df.index)    # reindex as the original data
       )

df = df[mask]

Output:

     a    c  d
2  DEF  0.3  x
3  DEF  0.2  x
4  DEF  0.5  x
5  DEF  0.4  y

Upvotes: 2

Related Questions