bluesummers
bluesummers

Reputation: 12627

Pandas DateTimeIndex - create a new value which has the max value of previous month

I have something like the following dataframe (notice dt is the index)

            fx          fy
dt      
2019-05-29  0.000000    0.000000
2019-05-30  65.410004   156.449997
2019-05-31  70.279999   125.040001
2019-06-01  49.220001   147.979996
2019-06-02  100.580002  232.539993
2019-06-03  262.230011  468.809998
2019-06-04  383.779999  525.390015
2019-06-05  761.609985  1147.380005
2019-06-06  1060.750000 1727.380005
2019-06-07  1640.300049 2827.120117

What I want to achieve is the have a new column named fz where each day's value, is the previous month's max value of fy - so the result would be

            fx          fy          fz
dt          
2019-05-29  0.000000    0.000000    NaN
2019-05-30  65.410004   156.449997  NaN
2019-05-31  70.279999   125.040001  NaN
2019-06-01  49.220001   147.979996  156.449997
2019-06-02  100.580002  232.539993  156.449997
2019-06-03  262.230011  468.809998  156.449997
2019-06-04  383.779999  525.390015  156.449997
2019-06-05  761.609985  1147.380005 156.449997
2019-06-06  1060.750000 1727.380005 156.449997
2019-06-07  1640.300049 2827.120117 156.449997

The first month's fz is empty because there is no previous month. I tried a combination of pd.Grouper(freq='M') with .transform() and .shift(-1, freq='M') but failed miserably as it changed the index entirely, and I would like to keep the index as is.

How can I solve this for arbitrary N months back?

Upvotes: 1

Views: 70

Answers (2)

Yoav Wigelman
Yoav Wigelman

Reputation: 11

you can do it in two steps:

  1. create a table with maximum values + shift per monthly period:

    maximum_shift = df.resample('M')['fy'].max().shift().to_period('M')
    
  2. concatenate/merge it to the original data frame:

    pd.DataFrame(pd.concat([df.to_period('M'), maximum_shift], axis=1).values, index=df.index, columns=df.columns.tolist()+['fz'])
    

Upvotes: 1

jezrael
jezrael

Reputation: 862841

Use DatetimeIndex.to_period for month period with shifting and mapping by Index.map:

#changed datetimeindex
print (df)
                     fx           fy
dt                                  
2019-05-29     0.000000     0.000000
2019-05-30    65.410004   156.449997
2019-05-31    70.279999   125.040001
2019-06-01    49.220001   147.979996
2019-06-02   100.580002   232.539993
2019-07-03   262.230011   468.809998
2019-07-04   383.779999   525.390015
2019-08-05   761.609985  1147.380005
2019-08-06  1060.750000  1727.380005
2019-09-07  1640.300049  2827.120117

N = 2
s = df.index.to_period('m')
df['fz'] = s.map(df.groupby(s)['fy'].max().shift(N))
print (df)
                     fx           fy          fz
dt                                              
2019-05-29     0.000000     0.000000         NaN
2019-05-30    65.410004   156.449997         NaN
2019-05-31    70.279999   125.040001         NaN
2019-06-01    49.220001   147.979996         NaN
2019-06-02   100.580002   232.539993         NaN
2019-07-03   262.230011   468.809998  156.449997
2019-07-04   383.779999   525.390015  156.449997
2019-08-05   761.609985  1147.380005  232.539993
2019-08-06  1060.750000  1727.380005  232.539993
2019-09-07  1640.300049  2827.120117  525.390015

Solution if datetimes are not conecutive, missing some months with add N to PeriodIndex by rename:

print (df)
                     fx           fy
dt                                  
2019-05-29     0.000000     0.000000
2019-05-30    65.410004   156.449997
2019-05-31    70.279999   125.040001
2019-06-01    49.220001   147.979996
2019-06-02   100.580002   232.539993
2019-08-03   262.230011   468.809998
2019-08-04   383.779999   525.390015
2019-09-05   761.609985  1147.380005
2019-09-06  1060.750000  1727.380005
2019-09-07  1640.300049  2827.120117

N = 1
s = df.index.to_period('m')
df['fz'] = s.map(df.groupby(s)['fy'].max().rename(lambda x: x + N))
print (df)
                     fx           fy          fz
dt                                              
2019-05-29     0.000000     0.000000         NaN
2019-05-30    65.410004   156.449997         NaN
2019-05-31    70.279999   125.040001         NaN
2019-06-01    49.220001   147.979996  156.449997
2019-06-02   100.580002   232.539993  156.449997
2019-08-03   262.230011   468.809998         NaN
2019-08-04   383.779999   525.390015         NaN
2019-09-05   761.609985  1147.380005  525.390015
2019-09-06  1060.750000  1727.380005  525.390015
2019-09-07  1640.300049  2827.120117  525.390015

Upvotes: 2

Related Questions