me7250
me7250

Reputation: 11

Calculate mean of several months over a longer time period

I have netCDF monthly temperature data for several decades and would like to calculate the 6 month mean over all years for each month of the year. E.g to get the 6 month mean for May, I would have to sum May and all five months before of every year (December, January, February, March, April) and then calculate the mean. I tried to apply this guide, but with a six month mean instead of a seasonal mean.

import pandas as pd
import xarray as xr
import numpy as np

ds = xr.open_dataset("...\\data.nc")

# Make a DataArray with the number of days in each month, size = len(time)
month_length = ds.time.dt.days_in_month

# Calculate the weights by grouping by 6 months
weights = xr.core.groupby.DataArrayGroupBy(month_length, 'time', grouper=pd.Grouper(freq='6MS')) / xr.core.groupby.DataArrayGroupBy(month_length, 'time', grouper=pd.Grouper(freq='6MS')).sum()
print(weights)

# Test that the sum of the weights for each season is 1.0
np.testing.assert_allclose(xr.core.groupby.DataArrayGroupBy(weights, 'time', grouper=pd.Grouper(freq='6MS')).sum().values, np.ones(2))

# Calculate the weighted average
ds_weighted = xr.core.groupby.DataArrayGroupBy((ds * weights), 'time', grouper=pd.Grouper(freq='6MS')).sum(dim='time')

ds.to_netcdf(path="..\\output.nc")

but for some reason the weights seem to not add up to 1.
Edit: I have now decided to try another approach to the weight problem. First I decided to multiply the data by the days in the month:

month_length = ds.time.dt.days_in_month
ds_multbymonth = ds * month_length

Then I calculate the rolling sum for a time period of 6 months.

ds_rolledSum = ds_multbymonth.rolling(time=6, min_periods=6).sum().stack().reset_index('time')

And lastly I wanted to group the summed up variables by month to divide them later by the number of days each 6 month sum has aggregated:

sumSixMonths = ds_rolledSum.groupby('time.month').sum()

It is quite an unelegant solution, maybe someone here has a better suggestion.

Upvotes: 1

Views: 287

Answers (1)

Robert Wilson
Robert Wilson

Reputation: 3397

If you are working on Linux/OSX, you could do this using my package nctoolkit (https://nctoolkit.readthedocs.io/en/latest/index.html).

You say your data is monthly. So what you want to do is calculate a rolling mean with a window of 6. Though rolling means typically are calculated using before and after times. So in the code below I have used the rolling sum and then divided by 6. This will calculate the 6-month mean, select the month of May and then convert to xarray, if needed.

import nctoolkit as nc
ds = nc.open_data("...\\data.nc")
ds.rolling_sum(window = 6)
ds.divde(6)
ds.select(month=5)
ds_xr = ds.to_xarray()

Upvotes: 2

Related Questions