Reputation: 197

Python - Conditionally remove first row by group

I'm looking to conditionally remove the first row by each group in my data frame.

Within each 'ID', the first row should always have a 1 in the 'Start' column. If it does not, I would like to remove that row from the data frame.

df = pd.DataFrame({'ID': ['A','A','B','B','C','C','C','D'],
               'Start': [0,1,1,0,0,0,1,1],
               'End': [1,0,0,1,1,1,0,0]})

  ID  Start  End
0  A      0    1
1  A      1    0
2  B      1    0
3  B      0    1
4  C      0    1
5  C      0    1
6  C      1    0
7  D      1    0

The result should look as follows:

result = pd.DataFrame({'ID': ['A','B','B','C','D'],
               'Start': [1,1,0,1,1],
               'End': [0,0,1,0,0]})

  ID  Start  End
0  A      1    0
1  B      1    0
2  B      0    1
3  C      1    0
4  D      1    0

Upvotes: 2

Answers (2)

BENY

Reputation: 323226

Try with idxmax with transform

df[df.index>=df.groupby('ID').Start.transform('idxmax')]

Upvotes: 1

cs95

Reputation: 402413

Use groupby and cumsum, then filter under the assumption that group cumsums must start from 1.

df[~df.groupby('ID')['Start'].cumsum().eq(0)]

  ID  Start  End
1  A      1    0
2  B      1    0
3  B      0    1
6  C      1    0
7  D      1    0

Upvotes: 1

Python - Conditionally remove first row by group

Answers (2)

Related Questions