Reputation: 2189
I have a data frame like this,
col1 col2
A 10
B 12
C 16
D 20
E 18
F 15
G 23
H 20
I 19
J 17
K 21
L 26
Now I want to find the peak and buttom from col2 and want to keep only those rows if it falls under peak or bottom. for example, the the data frame I am looking for is,
col1 col2
A 10
D 20
F 15
G 23
J 17
L 26
I could do this using a for loop and store the indices by comparing the next and previous value. But it will take the longer time to execute. I am looking for some pythonic way/ pandas shortcuts to do it efficiently.
Upvotes: 0
Views: 936
Reputation: 863216
First idea is compare previous and next value returned from Series.shift
and for return also first and last values is testing with missing values:
s1 = df.col2.shift()
s2 = df.col2.shift(-1)
m1 = (s1 > df.col2) & (s2 > df.col2)
m2 = (s1 < df.col2) & (s2 < df.col2)
df = df[m1 | m2 | s1.isna() | s2.isna()]
Or better testing first and last values:
df = df[m1 | m2 | df.index.isin([0, len(df)-1])]
print (df)
col1 col2
0 A 10
3 D 20
5 F 15
6 G 23
9 J 17
11 L 26
Or use argrelextrema
for positions, join together by numpy.union1d
and select by DataFrame.iloc
:
from scipy.signal import argrelextrema
a = argrelextrema(df.col2.values, np.less_equal, order=1)[0]
b = argrelextrema(df.col2.values, np.greater_equal, order=1)[0]
df = df.iloc[np.union1d(a, b)]
print (df)
col1 col2
0 A 10
3 D 20
5 F 15
6 G 23
9 J 17
11 L 26
Upvotes: 1