Puneet Sinha
Puneet Sinha

Reputation: 1081

Python pandas Dataframe : Delete all rows until the first occurrence of a certain value

I have a pandas 'Dataframe' which looks looks like, also please let me know if you need pd.Dataframe to the below table.

iD      a   b   c
c1      2   3   4
c1      2   3   4
c1      2   3   4
c1      2   E   4
c1      2   3   4
c2      3   4   5
c2      3   4   5
c2      3   E   5
c2      3   4   5

now in this dataframe there are two IDs c1 and c2. I want to delete all the rows above whenever 'E' appears in column 'b'.

my final dataframe should look like

iD      a   b   c
c1      2   E   4
c1      2   3   4
c2      3   E   5
c2      3   4   5

Just trying to keep the question short for people to answer. Please let me know if i have to add some extra datapoints in dataframe

Upvotes: 1

Views: 2266

Answers (2)

yatu
yatu

Reputation: 88236

You could groupby iD and and use boolean indexing with idxmax to keep from where the first B is found onwards:

df.groupby('iD').apply(lambda x: x.loc[(x.b == 'E').idxmax():,:])
                .reset_index(drop=True)

   iD  a  b  c
0  c1  2  E  4
1  c1  2  3  4
2  c2  3  E  5
3  c2  3  4  5

Upvotes: 1

cs95
cs95

Reputation: 402403

Use groupby and cumsum on a mask of boolean values comparing the column "b" to the letter "E":

df[df.b.eq('E').groupby(df.iD).cumsum()]

   iD  a  b  c
3  c1  2  E  4
4  c1  2  3  4
7  c2  3  E  5
8  c2  3  4  5

df[df.b.eq('E').groupby(df.iD).cumsum()].reset_index(drop=True)

   iD  a  b  c
0  c1  2  E  4
1  c1  2  3  4
2  c2  3  E  5
3  c2  3  4  5

Upvotes: 7

Related Questions