How to replace panda dataframe rows value with respect to month

Question

I have a panda time-series dataframe with DateTime as an index. I tried to replace the daily value by a long-term monthly average value. For example:

if my 2 years timeseries dataframe is something like:

df = pd.DataFrame({'data':np.random.rand(731)},index=pd.date_range('2000',periods=731))

Monthly mean:

mon_mean = df.groupby(df.index.month).mean()

And long term average looks like:

1   0.497286
2   0.536500
3   0.468002
4   0.477769
5   0.543201
6   0.520326
7   0.460261
8   0.524335
9   0.521869
10  0.516423
11  0.458476
12  0.494853

So what I want is to replace all the daily values in Jan by long-term Jan average value i.e 0.497286 and so on. But I was not able to do that.

jezrael · Accepted Answer

Use GroupBy.transform for set new column filled by aggregation values:

np.random.seed(2019)

df = pd.DataFrame({'data':np.random.rand(731)},index=pd.date_range('2000',periods=731))

df['mon'] = df.groupby(df.index.month)['data'].transform('mean')
print (df)

                data       mon
2000-01-01  0.903482  0.482155
2000-01-02  0.393081  0.482155
2000-01-03  0.623970  0.482155
2000-01-04  0.637877  0.482155
2000-01-05  0.880499  0.482155
             ...       ...
2001-12-27  0.755412  0.519518
2001-12-28  0.858582  0.519518
2001-12-29  0.884738  0.519518
2001-12-30  0.265324  0.519518
2001-12-31  0.948137  0.519518

[731 rows x 2 columns]

How to replace panda dataframe rows value with respect to month

Answers (1)

Related Questions