Reputation: 14661
I have the following dataframe:
date value
2014-01-20 10
2014-01-21 12
2014-01-22 13
2014-01-23 9
2014-01-24 7
2014-01-25 12
2014-01-26 11
I need to be able to keep track of when the latest maximum and minimum value occurred within a specific rolling window. For example if I were to use a rolling window period of 5, then I would need an output like the following:
date value rolling_max_date rolling_min_date
2014-01-20 10 2014-01-20 2014-01-20
2014-01-21 12 2014-01-21 2014-01-20
2014-01-22 13 2014-01-22 2014-01-20
2014-01-23 9 2014-01-22 2014-01-23
2014-01-24 7 2014-01-22 2014-01-24
2014-01-25 12 2014-01-22 2014-01-24
2014-01-26 11 2014-01-25 2014-01-24
All this shows is, what is the date of the latest maximum and minimum value within the rolling window. I know pandas has rolling_min and rolling_max, but im not sure how to keep track of the index/date of when the most recent max/min occured within the window.
Upvotes: 6
Views: 702
Reputation: 24752
Here is a workaround.
import pandas as pd
import numpy as np
# sample data
# ===============================================
np.random.seed(0)
df = pd.DataFrame(np.random.randint(1,30,20), index=pd.date_range('2015-01-01', periods=20, freq='D'), columns=['value'])
df
value
2015-01-01 13
2015-01-02 16
2015-01-03 22
2015-01-04 1
2015-01-05 4
2015-01-06 28
2015-01-07 4
2015-01-08 8
2015-01-09 10
2015-01-10 20
2015-01-11 22
2015-01-12 19
2015-01-13 5
2015-01-14 24
2015-01-15 7
2015-01-16 25
2015-01-17 25
2015-01-18 13
2015-01-19 27
2015-01-20 2
# processing
# ==========================================
# your cumstom function to track on max/min value/date
def track_minmax(df):
return pd.Series({'current_date': df.index[-1], 'rolling_max_val': df['value'].max(), 'rolling_max_date': df['value'].idxmax(), 'rolling_min_val': df['value'].min(), 'rolling_min_date': df['value'].idxmin()})
window = 5
# use list comprehension to do the for loop
pd.DataFrame([track_minmax(df.iloc[i:i+window]) for i in range(len(df)-window+1)]).set_index('current_date').reindex(df.index)
rolling_max_date rolling_max_val rolling_min_date rolling_min_val
2015-01-01 NaT NaN NaT NaN
2015-01-02 NaT NaN NaT NaN
2015-01-03 NaT NaN NaT NaN
2015-01-04 NaT NaN NaT NaN
2015-01-05 2015-01-03 22 2015-01-04 1
2015-01-06 2015-01-06 28 2015-01-04 1
2015-01-07 2015-01-06 28 2015-01-04 1
2015-01-08 2015-01-06 28 2015-01-04 1
2015-01-09 2015-01-06 28 2015-01-05 4
2015-01-10 2015-01-06 28 2015-01-07 4
2015-01-11 2015-01-11 22 2015-01-07 4
2015-01-12 2015-01-11 22 2015-01-08 8
2015-01-13 2015-01-11 22 2015-01-13 5
2015-01-14 2015-01-14 24 2015-01-13 5
2015-01-15 2015-01-14 24 2015-01-13 5
2015-01-16 2015-01-16 25 2015-01-13 5
2015-01-17 2015-01-16 25 2015-01-13 5
2015-01-18 2015-01-16 25 2015-01-15 7
2015-01-19 2015-01-19 27 2015-01-15 7
2015-01-20 2015-01-19 27 2015-01-20 2
Upvotes: 1
Reputation: 139312
There is a more general rolling_apply
where you can provide your own function. However, the custom functions receives the windows as arrays, not dataframes, so the index information is not available (so you cannot use idxmin/max
).
But lets try to achieve this in two steps:
In [41]: df = df.set_index('date')
In [42]: pd.rolling_apply(df, window=5, func=lambda x: x.argmin(), min_periods=1)
Out[42]:
value
date
2014-01-20 0
2014-01-21 0
2014-01-22 0
2014-01-23 3
2014-01-24 4
2014-01-25 3
2014-01-26 2
This gives you the index in the window where the minimum is found. But, this index is for that particular window and not for the entire dataframe. So let's add the start of the window, and then use this integer location to retrieve the correct index locations index:
In [45]: ilocs_window = pd.rolling_apply(df, window=5, func=lambda x: x.argmin(), min_periods=1)
In [46]: ilocs = ilocs_window['value'] + ([0, 0, 0, 0] + range(len(ilocs_window)-4))
In [47]: ilocs
Out[47]:
date
2014-01-20 0
2014-01-21 0
2014-01-22 0
2014-01-23 3
2014-01-24 4
2014-01-25 4
2014-01-26 4
Name: value, dtype: float64
In [48]: df.index.take(ilocs)
Out[48]:
Index([u'2014-01-20', u'2014-01-20', u'2014-01-20', u'2014-01-23',
u'2014-01-24', u'2014-01-24', u'2014-01-24'],
dtype='object', name=u'date')
In [49]: df['rolling_min_date'] = df.index.take(ilocs)
In [50]: df
Out[50]:
value rolling_min_date
date
2014-01-20 10 2014-01-20
2014-01-21 12 2014-01-20
2014-01-22 13 2014-01-20
2014-01-23 9 2014-01-23
2014-01-24 7 2014-01-24
2014-01-25 12 2014-01-24
2014-01-26 11 2014-01-24
The same can be done for the maximum:
ilocs_window = pd.rolling_apply(df, window=5, func=lambda x: x.argmax(), min_periods=1)
ilocs = ilocs_window['value'] + ([0, 0, 0, 0] + range(len(ilocs_window)-4))
df['rolling_max_date'] = df.index.take(ilocs)
Upvotes: 4