alpha-golf
alpha-golf

Reputation: 61

How to find all local maxima and minima in a Python Pandas series without knowing the frequency of the window

As background to my question, please allow me to explain the problem I am trying to solve. I have a sensor that is collecting pressure data. I am collecting this data into a pandas dataframe, structured like this:

DateTime                             Transmission Line PSI                
2021-02-18 11:55:34                  3.760
2021-02-18 11:55:49                  3.359
2021-02-18 11:56:04                  3.142
2021-02-18 11:56:19                  3.009
2021-02-18 11:56:34                  2.938
...                                    ...
2021-02-19 12:05:06                  3.013
2021-02-19 12:05:21                  3.011
2021-02-19 12:05:36                  3.009
2021-02-19 12:05:51                  3.009
2021-02-19 12:06:06                  3.007

I can plot the dataframe with pyplot and see visually when the compressor that feeds the system is running, how often, and how long it takes to pressurize the system. Plot of pressure data:
enter image description here

As is evident from the image, the cycles on the left side of the plot are radically shorter than those on the right.

The problem I am trying to solve is I want to programmatically calculate the max pressure, min pressure, period length, and duty cycle of the last complete on-off cycle. A bonus would be to programmatically calculate the total run time for a 24-hour period.

I figured that I would need to take the derivative of the pressure series, and I am using the solution found at python pandas: how to calculate derivative/gradient.

Plot of the derivative series:
enter image description here

The derivative series will then show numerically when the compressor is running (positive numbers) and not (zero or negative numbers). I was thinking that I could then find all of the maxima and minima of the individual peaks and from there get the timedeltas between them.

However, the problem I'm running into is any solutions I've found so far require me to know in advance how large a window to use (for example, the order argument when using SciPy argrelextrema https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.argrelextrema.html).

But my data series features cycles as short as minutes, and ideally (if we didn't have leaks!) cycles should stretch into hours or longer. Using short windows will cause me to have false maxima and minima in longer cycles, and longer windows will cause me to miss many maxima and minima on the shorter ones.

Any ideas for seeing programmatically what is plain to the eye in the above plot?

Upvotes: 1

Views: 2018

Answers (1)

alpha-golf
alpha-golf

Reputation: 61

Mr.T's comment above had my answer... using scipy.signal.find_peaks allowed me to do what I needed. Posting the code below.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter
import scipy.signal as sig

namespace = ['DateTime', 'Transmission Line PSI']
    
plt.rcParams["figure.figsize"] = [16.0, 9.0]
fig, ax = plt.subplots()
df = pd.read_csv(r'\\192.168.1.1\raid\graphdata.csv', names=namespace)

# convert imported date/time information to real datetimes and set as index
df['DateTime'] = pd.to_datetime(df['DateTime'])
df = df.set_index(df['DateTime']).drop('DateTime', axis=1)

# take first derivative of pressure data to show when pressure is rising or falling
df['deltas'] = df['Transmission Line PSI'].diff() / df.index.to_series().diff().dt.total_seconds()
df['deltas'] = df['deltas'].fillna(0)

peaks, _ = sig.find_peaks(df['deltas'], height=0.01)
neg_peaks, _ = sig.find_peaks(-df['deltas'], height=0.01)

# plotting peaks and neg_peaks against first derivative
plt.scatter(df.iloc[peaks].index, df.iloc[peaks]['deltas'])
plt.scatter(df.iloc[neg_peaks].index, df.iloc[neg_peaks]['deltas'])
plt.plot(df['deltas'])
plt.show()

# find timedeltas between all positive peaks - these are the periods of the cycle times
cycle_times = df.iloc[peaks].index.to_series().diff().dt.seconds.div(60, fill_value=0)

# plot periods
plt.plot(cycle_times)
plt.show()

Resulting plot of peaks against first derivative: plot of peaks against first derivative

Sample of cycle_times:

>>> cycle_times
DateTime
2021-02-18 11:59:04     0.000000
2021-02-18 12:04:04     5.000000
2021-02-18 12:09:35     5.516667
2021-02-18 12:16:05     6.500000
2021-02-18 12:21:35     5.500000
                         ...    
2021-02-19 08:54:09    17.016667
2021-02-19 09:27:56    33.783333
2021-02-19 10:15:44    47.800000
2021-02-19 11:24:19    68.583333
2021-02-19 12:02:36    38.283333
Name: DateTime, Length: 267, dtype: float64

Plot of cycle times: Plot of cycle times

https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.find_peaks.html

Upvotes: 4

Related Questions