darkpool
darkpool

Reputation: 14661

Most recent max/min value

I have the following dataframe:

date          value
2014-01-20    10
2014-01-21    12
2014-01-22    13
2014-01-23    9
2014-01-24    7
2014-01-25    12
2014-01-26    11

I need to be able to keep track of when the latest maximum and minimum value occurred within a specific rolling window. For example if I were to use a rolling window period of 5, then I would need an output like the following:

date          value   rolling_max_date    rolling_min_date
2014-01-20    10      2014-01-20          2014-01-20
2014-01-21    12      2014-01-21          2014-01-20
2014-01-22    13      2014-01-22          2014-01-20
2014-01-23    9       2014-01-22          2014-01-23
2014-01-24    7       2014-01-22          2014-01-24
2014-01-25    12      2014-01-22          2014-01-24
2014-01-26    11      2014-01-25          2014-01-24

All this shows is, what is the date of the latest maximum and minimum value within the rolling window. I know pandas has rolling_min and rolling_max, but im not sure how to keep track of the index/date of when the most recent max/min occured within the window.

Upvotes: 6

Views: 702

Answers (2)

Jianxun Li
Jianxun Li

Reputation: 24752

Here is a workaround.

import pandas as pd
import numpy as np

# sample data
# ===============================================
np.random.seed(0)
df = pd.DataFrame(np.random.randint(1,30,20), index=pd.date_range('2015-01-01', periods=20, freq='D'), columns=['value'])
df

            value
2015-01-01     13
2015-01-02     16
2015-01-03     22
2015-01-04      1
2015-01-05      4
2015-01-06     28
2015-01-07      4
2015-01-08      8
2015-01-09     10
2015-01-10     20
2015-01-11     22
2015-01-12     19
2015-01-13      5
2015-01-14     24
2015-01-15      7
2015-01-16     25
2015-01-17     25
2015-01-18     13
2015-01-19     27
2015-01-20      2

# processing
# ==========================================
# your cumstom function to track on max/min value/date
def track_minmax(df):
    return pd.Series({'current_date': df.index[-1], 'rolling_max_val': df['value'].max(), 'rolling_max_date': df['value'].idxmax(), 'rolling_min_val': df['value'].min(), 'rolling_min_date': df['value'].idxmin()})

window = 5
# use list comprehension to do the for loop
pd.DataFrame([track_minmax(df.iloc[i:i+window]) for i in range(len(df)-window+1)]).set_index('current_date').reindex(df.index)

           rolling_max_date  rolling_max_val rolling_min_date  rolling_min_val
2015-01-01              NaT              NaN              NaT              NaN
2015-01-02              NaT              NaN              NaT              NaN
2015-01-03              NaT              NaN              NaT              NaN
2015-01-04              NaT              NaN              NaT              NaN
2015-01-05       2015-01-03               22       2015-01-04                1
2015-01-06       2015-01-06               28       2015-01-04                1
2015-01-07       2015-01-06               28       2015-01-04                1
2015-01-08       2015-01-06               28       2015-01-04                1
2015-01-09       2015-01-06               28       2015-01-05                4
2015-01-10       2015-01-06               28       2015-01-07                4
2015-01-11       2015-01-11               22       2015-01-07                4
2015-01-12       2015-01-11               22       2015-01-08                8
2015-01-13       2015-01-11               22       2015-01-13                5
2015-01-14       2015-01-14               24       2015-01-13                5
2015-01-15       2015-01-14               24       2015-01-13                5
2015-01-16       2015-01-16               25       2015-01-13                5
2015-01-17       2015-01-16               25       2015-01-13                5
2015-01-18       2015-01-16               25       2015-01-15                7
2015-01-19       2015-01-19               27       2015-01-15                7
2015-01-20       2015-01-19               27       2015-01-20                2

Upvotes: 1

joris
joris

Reputation: 139312

There is a more general rolling_apply where you can provide your own function. However, the custom functions receives the windows as arrays, not dataframes, so the index information is not available (so you cannot use idxmin/max).

But lets try to achieve this in two steps:

In [41]: df = df.set_index('date')
In [42]: pd.rolling_apply(df, window=5, func=lambda x: x.argmin(), min_periods=1)
Out[42]:
            value
date
2014-01-20      0
2014-01-21      0
2014-01-22      0
2014-01-23      3
2014-01-24      4
2014-01-25      3
2014-01-26      2

This gives you the index in the window where the minimum is found. But, this index is for that particular window and not for the entire dataframe. So let's add the start of the window, and then use this integer location to retrieve the correct index locations index:

In [45]: ilocs_window = pd.rolling_apply(df, window=5, func=lambda x: x.argmin(), min_periods=1)

In [46]: ilocs = ilocs_window['value'] + ([0, 0, 0, 0] + range(len(ilocs_window)-4))

In [47]: ilocs
Out[47]:
date
2014-01-20    0
2014-01-21    0
2014-01-22    0
2014-01-23    3
2014-01-24    4
2014-01-25    4
2014-01-26    4
Name: value, dtype: float64

In [48]: df.index.take(ilocs)
Out[48]:
Index([u'2014-01-20', u'2014-01-20', u'2014-01-20', u'2014-01-23',
       u'2014-01-24', u'2014-01-24', u'2014-01-24'],
      dtype='object', name=u'date')

In [49]: df['rolling_min_date'] = df.index.take(ilocs)

In [50]: df
Out[50]:
            value rolling_min_date
date
2014-01-20     10       2014-01-20
2014-01-21     12       2014-01-20
2014-01-22     13       2014-01-20
2014-01-23      9       2014-01-23
2014-01-24      7       2014-01-24
2014-01-25     12       2014-01-24
2014-01-26     11       2014-01-24

The same can be done for the maximum:

ilocs_window = pd.rolling_apply(df, window=5, func=lambda x: x.argmax(), min_periods=1)
ilocs = ilocs_window['value'] + ([0, 0, 0, 0] + range(len(ilocs_window)-4))
df['rolling_max_date'] = df.index.take(ilocs)

Upvotes: 4

Related Questions