SCool
SCool

Reputation: 3375

Pandas resampling from daily to weekly adds an extra week? Is this normal?

I have a dataframe with daily transaction amounts. Date is index ds and transaction amount is column y

ds          y              
2017-08-16  10.0
2017-10-26  21.7
2017-11-04   5.0
2017-11-13  10.0
2017-11-27  14.0

The data only goes up to December 2019 as confirmed by:

print(df.index.max())

Timestamp('2019-12-31 00:00:00')

I want to resample it to a weekly transaction amount:

# Resample from weekly to monthly
df= df.resample('W').mean()

# Backfill any missing values
df.fillna(method='bfill', inplace=True)

And now the data goes up to Jan 2020:

print(df.index.max())

Timestamp('2020-01-05 00:00:00')

It's not very far into the future, just a week. So I am not really worried. But I don't understand it. Why does my data now go into 2020 after resampling to week?

Upvotes: 0

Views: 822

Answers (1)

user2395059
user2395059

Reputation: 80

The default behaviour of the resample function is to take the right edge of the bucket when using weekly offsets. If you'd like to switch that, you could do:

df_def = {
    'ds': ['2017-08-16','2017-10-26', '2017-11-04','2017-11-13','2017-11-27','2019-12-31'],
    'y': [10.0,21.7,5.0,10.0,14.0,999.0]
}

import pandas as pd
df = pd.DataFrame(df_def)
df['ds'] = pd.to_datetime(df.ds)
df = df.set_index('ds')

df.resample('W', label='left').mean().fillna(method='bfill')

Upvotes: 1

Related Questions