Reputation: 973
I'm trying to remove some rows in my table based on the change column.
id subindex change
A1 1 0
A1 2 1
A1 3 12
A1 4 0
A1 5 15
A1 6 1
A2 1 0
A2 2 11
A2 3 1
A2 4 0
In a subgroup (Id is identical) When change is higher than 10, I want to remove the subsequent rows. So the table would be:
id subindex change
A1 1 0
A1 2 1
A2 1 0
What I did is a groupby to identify where the sub-tables should be cut:
df['cut_position']=df[df.change >= 10].groupby('id')['subindex'].transform(lambda x: x.min())
that gives:
id subindex change cut_position
A1 1 0 0
A1 2 1 0
A1 3 12 3
A1 4 0 0
A1 5 15 0
A1 6 1 0
A2 1 0 0
A2 2 11 2
A2 3 1 0
A2 4 0 0
Then a groupby('id') should be easy to do, but I get problems trying to obtain that table:
id subindex change cut_position
A1 1 0 0
A1 2 1 0
A1 3 12 ToRemove_3
A1 4 0 ToRemove_0
A1 5 15 ToRemove_0
A1 6 1 ToRemove_0
A2 1 0 0
A2 2 11 ToRemove_2
A2 3 1 ToRemove_0
A2 4 0 ToRemove_0
When writing the code that could generate the table, I get :
df.groupby('id')[df['subindex'] >= df['cut_position'].max()].map(lambda x : 'ToRemove_' + x)
KeyError: 'Columns not found: False, True'
Note: I did something that worked, but it took way too much time to do it...
Upvotes: 0
Views: 5931
Reputation: 8503
Not 100% satisfied with this approach, but you could try it.
Given your frame
id subindex change
A1 1 0
A1 2 1
A1 3 12
A1 4 0
A1 5 15
A1 6 1
A2 1 0
A2 2 11
A2 3 1
A2 4 0
Then create a frame with the first row of each 'id' thats greater than 10
rowindex = df['change'] > 10
greaterThan10 = df[rowindex].groupby('id',as_index=False).first()
Than a merge() of your original frame and greaterThan10
dfKeep = pd.merge(df,greaterThan10,how='left',on=['id'],suffixes=['','_cut'])
Then filtering the new frame for the 'keepers'
mask = (dfKeep['subindex'] < dfKeep['subindex_cut'])
dfKeep[mask][['id','subindex','change']]
gives you
id subindex change
0 A1 1 0
1 A1 2 1
6 A2 1 0
Upvotes: 2