Reputation: 573
I have dataframe as follows:
df=pd.DataFrame({'value':[3,4,7,8,11,20,15,20,15,16],
'ID':[2,2,8,8,8,2,2,2,5,5],
'distance':[0,0,1,0,0,0,2,0,0,0]})
print(df)
value ID distance
0 3 2 0
1 4 2 0
2 7 8 1
3 8 8 0
4 11 8 0
5 20 2 0
6 15 2 2
7 20 2 0
8 15 5 0
9 16 5 0
I want to extract the row that occurs before the value changes from 0 to 1 in the column distance
.
From my sample dataframe I would expect the output to be as follows:
df_out=pd.DataFrame({'value':[4],'ID':[2],'distance':[0]})
print(df_out)
value ID distance
0 4 2 0
Upvotes: 0
Views: 228
Reputation: 25239
Just check backward diff equal -1
and distance
equal 0
m = df.distance.diff(-1).eq(-1) & df.distance.eq(0)
df.loc[m]
Out[635]:
value ID distance
1 4 2 0
Upvotes: 1
Reputation: 1896
Hope this helps!
I am using an extra column mask
to shift the values of distance
. First I initialize the column to zero and then I update that columns with shifted values of distance
column
>>> df['mask'] = 0
>>> df['mask'][:-1] = df['distance'][1:]
>>> df
value ID distance mask
0 3 2 0 0
1 4 2 0 1
2 7 8 1 0
3 8 8 0 0
4 11 8 0 0
5 20 2 0 2
6 15 2 2 0
7 20 2 0 0
8 15 5 0 0
9 16 5 0 0
>>> df_out = df[['value', 'ID', 'distance']][df['mask'] == 1].reset_index(drop=True)
>>> df_out
value ID distance
0 4 2 0
This may not be the best solution but a working solution.
Upvotes: 1
Reputation: 313
I think you just need a slight change to Quang Hoang's second suggestion (the shift should be -1).
Like this: df.loc[df['distance'].eq(0) & df['distance'].shift(-1).eq(1)]
This means: [if distance at a row is 0, and if distance at row-1 is 1] which is what you want. It's result is:
value ID distance
1 4 2 0
The only difference with the expected output you wrote is that the index is kept (1), if you really want index reset then you can do so like this:
df[df['distance'].eq(0) & df['distance'].shift(-1).eq(1)].reset_index(drop=True)
which gives exactly what you wrote in your question:
value ID distance
0 4 2 0
You can look at the doc for info on shift
Upvotes: 1