Reputation: 1339
I have a dataframe with a datetime index with hourly granularity which have 1 column of values. I want to have another column which contains the mean of values on yearly granularity.
I proceed like that
df = pd.DataFrame(range(8760*2), index=pd.date_range('2015-12-30', freq='H', periods=8760*2))
df1 = df.resample('A', how='mean')
df1.rename(columns={0: 'mean'}, inplace=True)
df1.reindex(df.index, method='bfill').head(48)
I obtain the below result for df1:
2015-12-31 23.5
2016-12-31 4439.5
2017-12-31 13175.5
and this for the rindexing one :
2015-12-30 00:00:00 23.5
...
2015-12-30 23:00:00 23.5
2015-12-31 00:00:00 23.5
2015-12-31 01:00:00 4439.5
2015-12-31 02:00:00 4439.5
2015-12-31 03:00:00 4439.5
2015-12-31 04:00:00 4439.5
...
2015-12-31 22:00:00 4439.5
2015-12-31 23:00:00 4439.5
As you can see there is a problem because the reindexing enforce the backfill value until the 0 hour of the last day of the year but not after.
Has someone the solution of this problem ?
Thanks very much in advance.
Upvotes: 1
Views: 46
Reputation: 880637
df = pd.DataFrame(range(8760*2), dtype='float',
index=pd.date_range('2015-12-30', freq='H', periods=8760*2))
df1 = df.groupby(df.index.year).transform('mean')
yields
...
2015-12-31 23:00:00 23.5
2016-01-01 00:00:00 4439.5
...
Note: I changed df
's dtype to float
so the mean would also be of dtype float
.
Upvotes: 2