Kallol
Kallol

Reputation: 2189

find rows with peaks and bottoms from a pandas data frame on a column

I have a data frame like this,

col1     col2
 A        10
 B        12 
 C        16
 D        20
 E        18
 F        15
 G        23
 H        20
 I        19
 J        17
 K        21
 L        26

Now I want to find the peak and buttom from col2 and want to keep only those rows if it falls under peak or bottom. for example, the the data frame I am looking for is,

col1     col2
 A        10
 D        20
 F        15
 G        23
 J        17
 L        26

I could do this using a for loop and store the indices by comparing the next and previous value. But it will take the longer time to execute. I am looking for some pythonic way/ pandas shortcuts to do it efficiently.

Upvotes: 0

Views: 936

Answers (1)

jezrael
jezrael

Reputation: 863216

First idea is compare previous and next value returned from Series.shift and for return also first and last values is testing with missing values:

s1 = df.col2.shift()
s2 = df.col2.shift(-1)

m1 = (s1 > df.col2) & (s2 > df.col2)
m2 = (s1 < df.col2) & (s2 < df.col2)

df = df[m1 | m2 | s1.isna() | s2.isna()]

Or better testing first and last values:

df = df[m1 | m2 | df.index.isin([0, len(df)-1])]

print (df)
   col1  col2
0     A    10
3     D    20
5     F    15
6     G    23
9     J    17
11    L    26

Or use argrelextrema for positions, join together by numpy.union1d and select by DataFrame.iloc:

from scipy.signal import argrelextrema

a = argrelextrema(df.col2.values, np.less_equal, order=1)[0]
b = argrelextrema(df.col2.values, np.greater_equal, order=1)[0]

df = df.iloc[np.union1d(a, b)]
print (df)
   col1  col2
0     A    10
3     D    20
5     F    15
6     G    23
9     J    17
11    L    26

Upvotes: 1

Related Questions