Edward
Edward

Reputation: 573

How to extract the row before the value of a column changes

I have dataframe as follows:

df=pd.DataFrame({'value':[3,4,7,8,11,20,15,20,15,16],
                 'ID':[2,2,8,8,8,2,2,2,5,5],
                 'distance':[0,0,1,0,0,0,2,0,0,0]})

print(df)
   value  ID  distance
0      3   2         0
1      4   2         0
2      7   8         1
3      8   8         0
4     11   8         0
5     20   2         0
6     15   2         2
7     20   2         0
8     15   5         0
9     16   5         0

I want to extract the row that occurs before the value changes from 0 to 1 in the column distance.

From my sample dataframe I would expect the output to be as follows:

df_out=pd.DataFrame({'value':[4],'ID':[2],'distance':[0]})

print(df_out)
   value  ID  distance
0      4   2         0

Upvotes: 0

Views: 228

Answers (3)

Andy L.
Andy L.

Reputation: 25239

Just check backward diff equal -1 and distance equal 0

m = df.distance.diff(-1).eq(-1) & df.distance.eq(0)
df.loc[m]

Out[635]:
   value  ID  distance
1      4   2         0

Upvotes: 1

sam
sam

Reputation: 1896

Hope this helps!

I am using an extra column mask to shift the values of distance. First I initialize the column to zero and then I update that columns with shifted values of distance column

>>> df['mask'] = 0
>>> df['mask'][:-1] = df['distance'][1:]
>>> df
   value  ID  distance  mask
0      3   2         0     0
1      4   2         0     1
2      7   8         1     0
3      8   8         0     0
4     11   8         0     0
5     20   2         0     2
6     15   2         2     0
7     20   2         0     0
8     15   5         0     0
9     16   5         0     0
>>> df_out = df[['value', 'ID', 'distance']][df['mask'] == 1].reset_index(drop=True)
>>> df_out 
   value  ID  distance
0      4   2         0     

This may not be the best solution but a working solution.

Upvotes: 1

user2677285
user2677285

Reputation: 313

I think you just need a slight change to Quang Hoang's second suggestion (the shift should be -1).

 Like this: df.loc[df['distance'].eq(0) & df['distance'].shift(-1).eq(1)]

This means: [if distance at a row is 0, and if distance at row-1 is 1] which is what you want. It's result is:

value   ID  distance
1   4   2   0

The only difference with the expected output you wrote is that the index is kept (1), if you really want index reset then you can do so like this:

df[df['distance'].eq(0) & df['distance'].shift(-1).eq(1)].reset_index(drop=True)

which gives exactly what you wrote in your question:

    value   ID  distance
0   4   2   0

You can look at the doc for info on shift

Upvotes: 1

Related Questions