Spinor8
Spinor8

Reputation: 1607

Pandas resampling to the existing index

I have a long timeseries that ends with the following data.

               ABC     CDE
Date                      
2017-05-26  107.00  241.71
2017-05-30  107.27  241.50
2017-05-31  107.32  241.44
2017-06-01  107.10  243.36
2017-06-02  107.57  244.17

I would like to resample it so that it becomes monthly data but I am interested in retaining the actual last monthly dates in the time series. If I do,

df.resample('BM').last()

gives the following tail-end output

2017-05-31  107.32  241.44 
2017-06-30  107.57  244.17

which does not give the correct last date of the dataframe. There are other dates within the resampled dataframe that are off as well. Essentially Pandas isn't using the existing index to find the month end but it's own business day calendar.

Is there an option I can feed into the Pandas resampling function so that it uses the existing index to achieve the desired result which is

2017-05-31  107.32  241.44 
2017-06-02  107.57  244.17

Upvotes: 5

Views: 7224

Answers (2)

Allen Qin
Allen Qin

Reputation: 19947

You can drop duplicates based on year and month and only keep the last row.

df.assign(m=df.index.to_period('m')).drop_duplicates('m','last').drop('m',1)
Out[728]: 
               ABC     CDE
Date                      
2017-05-31  107.32  241.44
2017-06-02  107.57  244.17

Or you can use group by year and month and then take the last row from each group.

df.reset_index()\
  .sort_values('Date')\
  .assign(m=df.index.to_period('m'))\
  .groupby(by='m')\
  .last()\
  .set_index('Date')
Out[677]: 
               ABC     CDE
Date                      
2017-05-31  107.32  241.44
2017-06-02  107.57  244.17

Upvotes: 0

jezrael
jezrael

Reputation: 862406

You need create new column from index and last set_index:

df = df.assign(Date=df.index).resample('BM').last().set_index('Date')
print (df)
               ABC     CDE
Date                      
2017-05-31  107.32  241.44
2017-06-02  107.57  244.17

But if need resample by month period only:

m = df.index.to_period('m')
df = df.reset_index().groupby(m).last().set_index('Date')
print (df)
               ABC     CDE
Date                      
2017-05-31  107.32  241.44
2017-06-02  107.57  244.17

Upvotes: 7

Related Questions