Jay Murphy
Jay Murphy

Reputation: 1

Computing the min() and max() of a dataframe for a resampled period but also retrieve the exact dates the min and max occurred?

I have a dataframe that has a datetime index and a single column containing the price of the asset. For example,

    date         asset_value
    2023-05-30   136.57000000
    2023-05-31   133.30000000
    2023-06-01   134.83000000
    2023-06-02   134.63000000
    2023-06-05   133.73000000

I have several years of data. I'd like to resample the dataframe for different periods and compute the min() and max() values for that period. For example,

    df.resample("W").agg({'asset_value': ["min", "max"]})

    date            min             max
    2023-06-04  133.30000000    136.57000000
    2023-06-11  133.73000000    136.20000000
    2023-06-18  138.93000000    141.79000000
    2023-06-25  136.07000000    138.92000000
    2023-07-02  139.80000000    146.55000000

This returns the min, max and right edge date for the bin. However, I would also like to know the date the min and max occurred.

Any advice how to do this? Thanks.

I did try adding additional columns to the original dataframe hoping that information would display in the resampled dataframe, but it didn't. See below.

    #Make sure the index is a datetime index
    df.index = pd.to_datetime(df.index)
    df.reset_index(inplace=True)

    #Add the day of the week, day, month and year as columns
    df['day_of_week'] = df['date'].dt.dayofweek
    df['day_of_month'] = df['date'].dt.day
    df['month'] = df['date'].dt.month
    df['year'] = df['date'].dt.year

    #Set the index back to my datetime index and drop the date column.
    df.set_index(df['date'], inplace=True)
    df.drop(['date'], axis=1, inplace=True)

Upvotes: 0

Views: 68

Answers (1)

Panda Kim
Panda Kim

Reputation: 13267

Example Code

import pandas as pd
data = {'date': ['2023-05-30', '2023-05-31', '2023-06-01', '2023-06-02', '2023-06-05'], 
        'asset_value': [136.57, 133.3, 134.83, 134.63, 133.73]}
df = pd.DataFrame(data)

df

         date  asset_value
0  2023-05-30       136.57
1  2023-05-31       133.30
2  2023-06-01       134.83
3  2023-06-02       134.63
4  2023-06-05       133.73

Code

use idxmax and idxmin

df['date'] = pd.to_datetime(df['date'])

out = (df.set_index('date')
         .resample('W')['asset_value']
         .agg(['min', 'max', ('min_date', 'idxmin'), ('max_date', 'idxmax')])
)

out

               min     max   min_date   max_date
date                                            
2023-06-04  133.30  136.57 2023-05-31 2023-05-30
2023-06-11  133.73  133.73 2023-06-05 2023-06-05

However, using idxmax and idxmin will only give you the earliest date that occurred, even if there are multiple dates that correspond to the maximum or minimum. I've used idxmax and idxmin because there's no explanation of what to do when you have multiple dates, and there are different solutions depending on how you want to handle it.


Update

when exist multiple minimum or maximum

example:

import pandas as pd
data = {'date': ['2023-05-30', '2023-05-31', '2023-06-01', '2023-06-02', '2023-06-05'], 
        'asset_value': [136.57, 133.3, 133.3, 134.63, 133.73]}
df = pd.DataFrame(data)

df

         date  asset_value
0  2023-05-30       136.57
1  2023-05-31       133.30 <--
2  2023-06-01       133.30 <--
3  2023-06-02       134.63
4  2023-06-05       133.73

code:

df['date'] = pd.to_datetime(df['date'])

cond1 = df.resample('W', on='date')['asset_value'].transform('min') == df['asset_value']
cond2 = df.resample('W', on='date')['asset_value'].transform('max') == df['asset_value']

out = (df.resample('W', on='date')['asset_value']
         .agg(['min', 'max'])
)
out['min_date'] = df[cond1].resample('W', on='date')['date'].agg(list)
out['max_date'] = df[cond2].resample('W', on='date')['date'].agg(list)

out:

enter image description here

Upvotes: 0

Related Questions