Reputation: 1607
I have a long timeseries that ends with the following data.
ABC CDE
Date
2017-05-26 107.00 241.71
2017-05-30 107.27 241.50
2017-05-31 107.32 241.44
2017-06-01 107.10 243.36
2017-06-02 107.57 244.17
I would like to resample it so that it becomes monthly data but I am interested in retaining the actual last monthly dates in the time series. If I do,
df.resample('BM').last()
gives the following tail-end output
2017-05-31 107.32 241.44
2017-06-30 107.57 244.17
which does not give the correct last date of the dataframe. There are other dates within the resampled dataframe that are off as well. Essentially Pandas isn't using the existing index to find the month end but it's own business day calendar.
Is there an option I can feed into the Pandas resampling function so that it uses the existing index to achieve the desired result which is
2017-05-31 107.32 241.44
2017-06-02 107.57 244.17
Upvotes: 5
Views: 7224
Reputation: 19947
You can drop duplicates based on year and month and only keep the last row.
df.assign(m=df.index.to_period('m')).drop_duplicates('m','last').drop('m',1)
Out[728]:
ABC CDE
Date
2017-05-31 107.32 241.44
2017-06-02 107.57 244.17
Or you can use group by year and month and then take the last row from each group.
df.reset_index()\
.sort_values('Date')\
.assign(m=df.index.to_period('m'))\
.groupby(by='m')\
.last()\
.set_index('Date')
Out[677]:
ABC CDE
Date
2017-05-31 107.32 241.44
2017-06-02 107.57 244.17
Upvotes: 0
Reputation: 862406
You need create new column from index and last set_index
:
df = df.assign(Date=df.index).resample('BM').last().set_index('Date')
print (df)
ABC CDE
Date
2017-05-31 107.32 241.44
2017-06-02 107.57 244.17
But if need resample by month period only:
m = df.index.to_period('m')
df = df.reset_index().groupby(m).last().set_index('Date')
print (df)
ABC CDE
Date
2017-05-31 107.32 241.44
2017-06-02 107.57 244.17
Upvotes: 7