Alex Friedman
Alex Friedman

Reputation: 209

Get last value per year counting from the last date in Pandas

What's the best way in Pandas to resample/group/etc by year, but instead of going by calendar years, calculate full years starting with the last date in the data?

Example data set

pd.DataFrame({
  'MyDate': ['2017-02-01', '2017-07-05', '2017-08-26', '2017-09-03', '2018-02-04', 
             '2018-08-03', '2018-08-10', '2018-12-03', '2019-07-13', '2019-08-15'],
  'MyValue': [100, 90, 80, 70, 60, 50, 40, 30, 20, 10]
})
    MyDate  MyValue
0   2017-02-01  100
1   2017-07-05  90
2   2017-08-26  80
3   2017-09-03  70
4   2018-02-04  60
5   2018-08-03  50
6   2018-08-10  40
7   2018-12-03  30
8   2019-07-13  20
9   2019-08-15  10

Example result

Last date is 2019-08-15, so I'd like to group by the last full year 2018-08-16 - 2019-08-15, the 2017-08-17 - 2018-08-15, etc.

Here getting the last result per such year:

MyDate  MyValue
0   2017-07-05  90
1   2018-08-10  40
2   2019-08-15  10

Upvotes: 3

Views: 1460

Answers (3)

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 476584

You first need to parse your dates to real date objects, like:

df['MyDate'] = pd.to_datetime(df['MyDate'])

Next we can perform a group by with a relativedelta from the python-dateutil package:

>>> from operator import attrgetter
>>> from dateutil.relativedelta import relativedelta
>>> df.groupby(df['MyDate'].apply(relativedelta, dt2=df['MyDate'].max()).apply(attrgetter('years'))).last()
           MyDate  MyValue
MyDate                    
-2     2017-07-05       90
-1     2018-08-10       40
 0     2019-08-15       10

Upvotes: 3

jezrael
jezrael

Reputation: 862591

You can subtract last value and create years groups and pass to groupby with GroupBy.last:

df['MyDate'] = pd.to_datetime(df['MyDate'])

s = (df['MyDate'].sub(df['MyDate'].iat[-1]).dt.days / 365.25).astype(int)
df = df.groupby(s).last().reset_index(drop=True)
print (df)
      MyDate  MyValue
0 2017-07-05       90
1 2018-08-10       40
2 2019-08-15       10

Upvotes: 3

ALollz
ALollz

Reputation: 59529

One way is to use pd.cut, specifying the bins with pd.offsets.DateOffset to get calendar year separation.

import numpy as np
import pandas as pd

df['MyDate'] = pd.to_datetime(df['MyDate'])

N = int(np.ceil((df.MyDate.max()-df.MyDate.min())/np.timedelta64(1, 'Y')))+1
bins = [df.MyDate.max()-pd.offsets.DateOffset(years=y) for y in range(N)][::-1]

df.groupby(pd.cut(df.MyDate, bins)).last()

#                             MyDate  MyValue
#MyDate                                      
#(2016-08-15, 2017-08-15] 2017-07-05       90
#(2017-08-15, 2018-08-15] 2018-08-10       40
#(2018-08-15, 2019-08-15] 2019-08-15       10

Upvotes: 2

Related Questions