Reputation: 11
I have netCDF monthly temperature data for several decades and would like to calculate the 6 month mean over all years for each month of the year. E.g to get the 6 month mean for May, I would have to sum May and all five months before of every year (December, January, February, March, April) and then calculate the mean. I tried to apply this guide, but with a six month mean instead of a seasonal mean.
import pandas as pd
import xarray as xr
import numpy as np
ds = xr.open_dataset("...\\data.nc")
# Make a DataArray with the number of days in each month, size = len(time)
month_length = ds.time.dt.days_in_month
# Calculate the weights by grouping by 6 months
weights = xr.core.groupby.DataArrayGroupBy(month_length, 'time', grouper=pd.Grouper(freq='6MS')) / xr.core.groupby.DataArrayGroupBy(month_length, 'time', grouper=pd.Grouper(freq='6MS')).sum()
print(weights)
# Test that the sum of the weights for each season is 1.0
np.testing.assert_allclose(xr.core.groupby.DataArrayGroupBy(weights, 'time', grouper=pd.Grouper(freq='6MS')).sum().values, np.ones(2))
# Calculate the weighted average
ds_weighted = xr.core.groupby.DataArrayGroupBy((ds * weights), 'time', grouper=pd.Grouper(freq='6MS')).sum(dim='time')
ds.to_netcdf(path="..\\output.nc")
but for some reason the weights seem to not add up to 1.
Edit: I have now decided to try another approach to the weight problem. First I decided to multiply the data by the days in the month:
month_length = ds.time.dt.days_in_month
ds_multbymonth = ds * month_length
Then I calculate the rolling sum for a time period of 6 months.
ds_rolledSum = ds_multbymonth.rolling(time=6, min_periods=6).sum().stack().reset_index('time')
And lastly I wanted to group the summed up variables by month to divide them later by the number of days each 6 month sum has aggregated:
sumSixMonths = ds_rolledSum.groupby('time.month').sum()
It is quite an unelegant solution, maybe someone here has a better suggestion.
Upvotes: 1
Views: 287
Reputation: 3397
If you are working on Linux/OSX, you could do this using my package nctoolkit (https://nctoolkit.readthedocs.io/en/latest/index.html).
You say your data is monthly. So what you want to do is calculate a rolling mean with a window of 6. Though rolling means typically are calculated using before and after times. So in the code below I have used the rolling sum and then divided by 6. This will calculate the 6-month mean, select the month of May and then convert to xarray, if needed.
import nctoolkit as nc
ds = nc.open_data("...\\data.nc")
ds.rolling_sum(window = 6)
ds.divde(6)
ds.select(month=5)
ds_xr = ds.to_xarray()
Upvotes: 2