Reputation: 3375
I have a dataframe with daily transaction amounts. Date is index ds
and transaction amount is column y
ds y
2017-08-16 10.0
2017-10-26 21.7
2017-11-04 5.0
2017-11-13 10.0
2017-11-27 14.0
The data only goes up to December 2019 as confirmed by:
print(df.index.max())
Timestamp('2019-12-31 00:00:00')
I want to resample it to a weekly transaction amount:
# Resample from weekly to monthly
df= df.resample('W').mean()
# Backfill any missing values
df.fillna(method='bfill', inplace=True)
And now the data goes up to Jan 2020:
print(df.index.max())
Timestamp('2020-01-05 00:00:00')
It's not very far into the future, just a week. So I am not really worried. But I don't understand it. Why does my data now go into 2020 after resampling to week?
Upvotes: 0
Views: 822
Reputation: 80
The default behaviour of the resample
function is to take the right edge of the bucket when using weekly offsets. If you'd like to switch that, you could do:
df_def = {
'ds': ['2017-08-16','2017-10-26', '2017-11-04','2017-11-13','2017-11-27','2019-12-31'],
'y': [10.0,21.7,5.0,10.0,14.0,999.0]
}
import pandas as pd
df = pd.DataFrame(df_def)
df['ds'] = pd.to_datetime(df.ds)
df = df.set_index('ds')
df.resample('W', label='left').mean().fillna(method='bfill')
Upvotes: 1