Reputation: 12627
I have something like the following dataframe (notice dt
is the index)
fx fy
dt
2019-05-29 0.000000 0.000000
2019-05-30 65.410004 156.449997
2019-05-31 70.279999 125.040001
2019-06-01 49.220001 147.979996
2019-06-02 100.580002 232.539993
2019-06-03 262.230011 468.809998
2019-06-04 383.779999 525.390015
2019-06-05 761.609985 1147.380005
2019-06-06 1060.750000 1727.380005
2019-06-07 1640.300049 2827.120117
What I want to achieve is the have a new column named fz
where each day's value, is the previous month's max value of fy
- so the result would be
fx fy fz
dt
2019-05-29 0.000000 0.000000 NaN
2019-05-30 65.410004 156.449997 NaN
2019-05-31 70.279999 125.040001 NaN
2019-06-01 49.220001 147.979996 156.449997
2019-06-02 100.580002 232.539993 156.449997
2019-06-03 262.230011 468.809998 156.449997
2019-06-04 383.779999 525.390015 156.449997
2019-06-05 761.609985 1147.380005 156.449997
2019-06-06 1060.750000 1727.380005 156.449997
2019-06-07 1640.300049 2827.120117 156.449997
The first month's fz
is empty because there is no previous month. I tried a combination of pd.Grouper(freq='M')
with .transform()
and .shift(-1, freq='M')
but failed miserably as it changed the index entirely, and I would like to keep the index as is.
How can I solve this for arbitrary N
months back?
Upvotes: 1
Views: 70
Reputation: 11
you can do it in two steps:
create a table with maximum values + shift per monthly period:
maximum_shift = df.resample('M')['fy'].max().shift().to_period('M')
concatenate/merge it to the original data frame:
pd.DataFrame(pd.concat([df.to_period('M'), maximum_shift], axis=1).values, index=df.index, columns=df.columns.tolist()+['fz'])
Upvotes: 1
Reputation: 862841
Use DatetimeIndex.to_period
for month period with shifting and mapping by Index.map
:
#changed datetimeindex
print (df)
fx fy
dt
2019-05-29 0.000000 0.000000
2019-05-30 65.410004 156.449997
2019-05-31 70.279999 125.040001
2019-06-01 49.220001 147.979996
2019-06-02 100.580002 232.539993
2019-07-03 262.230011 468.809998
2019-07-04 383.779999 525.390015
2019-08-05 761.609985 1147.380005
2019-08-06 1060.750000 1727.380005
2019-09-07 1640.300049 2827.120117
N = 2
s = df.index.to_period('m')
df['fz'] = s.map(df.groupby(s)['fy'].max().shift(N))
print (df)
fx fy fz
dt
2019-05-29 0.000000 0.000000 NaN
2019-05-30 65.410004 156.449997 NaN
2019-05-31 70.279999 125.040001 NaN
2019-06-01 49.220001 147.979996 NaN
2019-06-02 100.580002 232.539993 NaN
2019-07-03 262.230011 468.809998 156.449997
2019-07-04 383.779999 525.390015 156.449997
2019-08-05 761.609985 1147.380005 232.539993
2019-08-06 1060.750000 1727.380005 232.539993
2019-09-07 1640.300049 2827.120117 525.390015
Solution if datetimes are not conecutive, missing some months with add N
to PeriodIndex
by rename
:
print (df)
fx fy
dt
2019-05-29 0.000000 0.000000
2019-05-30 65.410004 156.449997
2019-05-31 70.279999 125.040001
2019-06-01 49.220001 147.979996
2019-06-02 100.580002 232.539993
2019-08-03 262.230011 468.809998
2019-08-04 383.779999 525.390015
2019-09-05 761.609985 1147.380005
2019-09-06 1060.750000 1727.380005
2019-09-07 1640.300049 2827.120117
N = 1
s = df.index.to_period('m')
df['fz'] = s.map(df.groupby(s)['fy'].max().rename(lambda x: x + N))
print (df)
fx fy fz
dt
2019-05-29 0.000000 0.000000 NaN
2019-05-30 65.410004 156.449997 NaN
2019-05-31 70.279999 125.040001 NaN
2019-06-01 49.220001 147.979996 156.449997
2019-06-02 100.580002 232.539993 156.449997
2019-08-03 262.230011 468.809998 NaN
2019-08-04 383.779999 525.390015 NaN
2019-09-05 761.609985 1147.380005 525.390015
2019-09-06 1060.750000 1727.380005 525.390015
2019-09-07 1640.300049 2827.120117 525.390015
Upvotes: 2