Reputation: 842
My dataframe looks like
ID colA
1 B
1 D
2 B
2 D
2 C
I have return all rows after the last occurrence of event B in each group. The output will be :
ID colA
1 D
2 D
2 C
I tried
a = df['colA'].str.contains('B').groupby(df['ID'])
b = df[(a.transform('sum') - a.cumsum()).eq(0)]
and it's working fine so far. I am just wondering if there is any alternative approach to achieve this?
Upvotes: 4
Views: 1638
Reputation: 88236
You could do:
ix = (df.colA.eq('B')
.cumsum()
.groupby(df.ID)
.apply(lambda x: x.loc[x.idxmax()+1:]).index.get_level_values(1))
df.loc[ix,:]
ID colA
1 1 D
3 2 D
4 2 C
Upvotes: 3
Reputation: 402383
Reverse your rows (this is important). Then call groupby
and cumsum
, and take all rows with (reversed) cumsum value equal to zero.
df[df.colA.eq('B')[::-1].astype(int).groupby(df.ID).cumsum().eq(0)]
ID colA
1 1 D
3 2 D
4 2 C
Upvotes: 7
Reputation: 323226
IIUC
def yourlast(x):
return x.loc[x.colA.where(df.colA.eq('B')).last_valid_index()+1:]
df.groupby('ID').apply(yourlast)
Out[163]:
ID colA
ID
1 1 1 D
2 3 2 D
4 2 C
Upvotes: 2