Reputation: 3828
I have a dataset that consists of 6169, time-series data points. I am trying to find the minimum within a certain rolling window. In this case, the window is of 396 (slightly over a year). I have written the following code below using pandas rolling function. However, When I run the code I end up with a lot more values than what I should get. What I mean is I should end up with 6169/396 = 15 or 16 values. But instead, I get with 258 values. Any ideas why?. To get an idea of the data I have posted a plot. I have marked a few red circles points which it should catch and by observing the graph it shouldn't definitely catch that many points. Is there anything wrong with the line of my code?
m4_minidx = df['fitted.values'].rolling(window = 396).min() == df['fitted.values']
m4_min = df[m4_minidx]
print(df.shape)
print(m4_min.shape)
output:
(6169, 5)
(258, 5)
Upvotes: 2
Views: 5047
Reputation: 3077
The problem is the rolling window, you get a local minimum every time. Here's a sketch to explain:
The black lines are the moving window, while the red circle the local minima.
The problem you want to solve is slightly more complex, finding local minima is not trivial in general. Take a look at these other resources: local minima x-y or local minima 1d array or peak finder in scipy library
============= edit ==================
If you have no repetition in your dataframe, you obtain the result you expected:
x = np.random.random(6169)
df = pd.DataFrame({'fitted.values': x})
m4_minidx = df['fitted.values'].rolling(window = 396).min() == df['fitted.values']
m4_min = df[m4_minidx]
print(df.shape)
print(m4_min.shape)
output:
(6169, 1)
(14, 1)
Upvotes: 3