Canvas
Canvas

Reputation: 41

Slice a dataframe based on the first particular value shows up

I want to do a slice a dataframe in pandas like this:

index  a    b
0      A    -
1      A    +
2      A    -
3      B    -
4      C    +
5      C    -

I want to keep all the rows after the first '+', grouped by column A, and delete all the rows in each group starting with '-'. The outcome should be like this:

index  a    b
1      A    +
2      A    -
4      C    +
5      C    -

How to do this?

Upvotes: 1

Views: 99

Answers (2)

jezrael
jezrael

Reputation: 862611

Use GroupBy.cummax with compare b for + for keep all rows after first + per groups:

df1 = (df[df.assign(new = lambda x: x['b'].eq('+'))
       .groupby('a')['new']
       .cummax()])

print (df1)
   a  b
1  A  +
2  A  -
4  C  +
5  C  -

Upvotes: 1

mozway
mozway

Reputation: 260600

Simple syntax, use groupby on Series with cummax:

df[df['b'].eq('+').groupby(df['a']).cummax()]

output:

   index  a  b
1      1  A  +
2      2  A  -
4      4  C  +
5      5  C  -

If you also want to delete groups that start with - ("delete all the rows in each group starting with '-'"), you can combine cummin/cummax:

df[df['b'].ne('-').groupby(df['a']).apply(lambda s: s.cummin().cummax())]

output:

   index  a  b
4      4  C  +
5      5  C  -

Upvotes: 1

Related Questions