Reputation: 1
I have a dataframe that has a datetime index and a single column containing the price of the asset. For example,
date asset_value
2023-05-30 136.57000000
2023-05-31 133.30000000
2023-06-01 134.83000000
2023-06-02 134.63000000
2023-06-05 133.73000000
I have several years of data. I'd like to resample the dataframe for different periods and compute the min() and max() values for that period. For example,
df.resample("W").agg({'asset_value': ["min", "max"]})
date min max
2023-06-04 133.30000000 136.57000000
2023-06-11 133.73000000 136.20000000
2023-06-18 138.93000000 141.79000000
2023-06-25 136.07000000 138.92000000
2023-07-02 139.80000000 146.55000000
This returns the min, max and right edge date for the bin. However, I would also like to know the date the min and max occurred.
Any advice how to do this? Thanks.
I did try adding additional columns to the original dataframe hoping that information would display in the resampled dataframe, but it didn't. See below.
#Make sure the index is a datetime index
df.index = pd.to_datetime(df.index)
df.reset_index(inplace=True)
#Add the day of the week, day, month and year as columns
df['day_of_week'] = df['date'].dt.dayofweek
df['day_of_month'] = df['date'].dt.day
df['month'] = df['date'].dt.month
df['year'] = df['date'].dt.year
#Set the index back to my datetime index and drop the date column.
df.set_index(df['date'], inplace=True)
df.drop(['date'], axis=1, inplace=True)
Upvotes: 0
Views: 68
Reputation: 13267
Example Code
import pandas as pd
data = {'date': ['2023-05-30', '2023-05-31', '2023-06-01', '2023-06-02', '2023-06-05'],
'asset_value': [136.57, 133.3, 134.83, 134.63, 133.73]}
df = pd.DataFrame(data)
df
date asset_value
0 2023-05-30 136.57
1 2023-05-31 133.30
2 2023-06-01 134.83
3 2023-06-02 134.63
4 2023-06-05 133.73
Code
use idxmax
and idxmin
df['date'] = pd.to_datetime(df['date'])
out = (df.set_index('date')
.resample('W')['asset_value']
.agg(['min', 'max', ('min_date', 'idxmin'), ('max_date', 'idxmax')])
)
out
min max min_date max_date
date
2023-06-04 133.30 136.57 2023-05-31 2023-05-30
2023-06-11 133.73 133.73 2023-06-05 2023-06-05
However, using idxmax
and idxmin
will only give you the earliest date that occurred, even if there are multiple dates that correspond to the maximum or minimum. I've used idxmax
and idxmin
because there's no explanation of what to do when you have multiple dates, and there are different solutions depending on how you want to handle it.
Update
when exist multiple minimum or maximum
example:
import pandas as pd
data = {'date': ['2023-05-30', '2023-05-31', '2023-06-01', '2023-06-02', '2023-06-05'],
'asset_value': [136.57, 133.3, 133.3, 134.63, 133.73]}
df = pd.DataFrame(data)
df
date asset_value
0 2023-05-30 136.57
1 2023-05-31 133.30 <--
2 2023-06-01 133.30 <--
3 2023-06-02 134.63
4 2023-06-05 133.73
code:
df['date'] = pd.to_datetime(df['date'])
cond1 = df.resample('W', on='date')['asset_value'].transform('min') == df['asset_value']
cond2 = df.resample('W', on='date')['asset_value'].transform('max') == df['asset_value']
out = (df.resample('W', on='date')['asset_value']
.agg(['min', 'max'])
)
out['min_date'] = df[cond1].resample('W', on='date')['date'].agg(list)
out['max_date'] = df[cond2].resample('W', on='date')['date'].agg(list)
out:
Upvotes: 0