Reputation: 1081
I have a pandas 'Dataframe' which looks looks like, also please let me know if you need pd.Dataframe
to the below table.
iD a b c
c1 2 3 4
c1 2 3 4
c1 2 3 4
c1 2 E 4
c1 2 3 4
c2 3 4 5
c2 3 4 5
c2 3 E 5
c2 3 4 5
now in this dataframe there are two IDs c1 and c2. I want to delete all the rows above whenever 'E' appears in column 'b'.
my final dataframe should look like
iD a b c
c1 2 E 4
c1 2 3 4
c2 3 E 5
c2 3 4 5
Just trying to keep the question short for people to answer. Please let me know if i have to add some extra datapoints in dataframe
Upvotes: 1
Views: 2266
Reputation: 88236
You could groupby
iD
and and use boolean indexing
with idxmax
to keep from where the first B
is found onwards:
df.groupby('iD').apply(lambda x: x.loc[(x.b == 'E').idxmax():,:])
.reset_index(drop=True)
iD a b c
0 c1 2 E 4
1 c1 2 3 4
2 c2 3 E 5
3 c2 3 4 5
Upvotes: 1
Reputation: 402403
Use groupby
and cumsum
on a mask of boolean values comparing the column "b" to the letter "E":
df[df.b.eq('E').groupby(df.iD).cumsum()]
iD a b c
3 c1 2 E 4
4 c1 2 3 4
7 c2 3 E 5
8 c2 3 4 5
df[df.b.eq('E').groupby(df.iD).cumsum()].reset_index(drop=True)
iD a b c
0 c1 2 E 4
1 c1 2 3 4
2 c2 3 E 5
3 c2 3 4 5
Upvotes: 7