K. Shores
K. Shores

Reputation: 1005

Monthly climatology across several years, repeated for each day in that month over all years

I need to find the monthly climatology of some data that has daily values across several years. The code below sufficiently summarizes what I am trying to do. monthly_mean holds the averages over all years for specific months. I then need to assign that average in a new column for each day in a specific month over all of the years. For whatever reason, my assignment, df['A Climatology'] = group['A Climatology'], is only assigning values to the month of December. How can I make the assignment happen for all months?

data = np.random.randint(5,30,size=(365*3,3))
df = pd.DataFrame(data, columns=['A', 'B', 'C'], index=pd.date_range('2021-01-01', periods=365*3))
df['A Climatology'] = np.nan

monthly_mean = df['A'].groupby(df.index.month).mean()
for month, group in df.groupby(df.index.month):
    group['A Climatology'] = monthly_mean.loc[month]
    df['A Climatology'] = group['A Climatology']
    
df

Upvotes: 0

Views: 584

Answers (2)

cazman
cazman

Reputation: 1492

merged_df = pd.merge(df, 
             monthly_mean, 
             how='left', 
             left_on=df.index.month, 
             right_on=monthly_mean.index).drop('key_0', axis=1).set_index(df.index)

            A_x B   C   A Climatology   A_y
2021-01-01  12  20  18  NaN             16.752688
2021-01-02  24  26  11  NaN             16.752688
2021-01-03  18  27  15  NaN             16.752688
2021-01-04  18  5   22  NaN             16.752688
2021-01-05  10  15  25  NaN             16.752688
... ... ... ... ... ...
2023-12-27  19  15  11  16.11828        16.118280
2023-12-28  16  23  25  16.11828        16.118280
2023-12-29  6   13  16  16.11828        16.118280
2023-12-30  10  9   14  16.11828        16.118280
2023-12-31  15  22  17  16.11828        16.118280

Or to do this without creating a new data frame:

df = df.reset_index().merge(monthly_mean, how='left', left_on=df.index.month, right_on=monthly_mean.index).set_index('index')

monthly_means:

1     16.752688
2     16.476190
3     16.795699
4     17.111111
5     17.795699
6     18.111111
7     16.806452
8     15.236559
9     15.600000
10    18.279570
11    16.555556
12    16.118280
Name: A, dtype: float64

Upvotes: 1

YoungTim
YoungTim

Reputation: 173

Your code is setting the column == to the group, so every iteration of your loop you're setting the df's values only for that group---which is why your df ends on December, the last month in the list.

monthly_mean = df['A'].groupby(df.index.month).mean()
for month, group in df.groupby(df.index.month):
    df.loc[lambda df: df.index.month == month, 'A Climatology'] = monthly_mean.loc[month]

Instead, you could directly set the df's values where the month == the iterable month.

Upvotes: 2

Related Questions