Reputation: 121
I'd like to perform the mean (and quantiles) along the years on an xarray.
If the time sampling is multiple of days, I can easy do something like that:
arr.groupby('time.dayofyear').mean('time')
But I can't find an easy way to do the same if I have got also hours. (Now I'm using an horrible trick).
For example in this case:
import numpy as np
import pandas as pd
import xarray as xr
time = pd.date_range('2000-01-01', '2010-01-01', freq='6h')
arr = xr.DataArray(
np.ones(len(time)),
dims='time',
coords={'time' : ('time', time)}
)
Probably I'm missing something, I'm not very expert on pandas and xarray. Have you got some tips?
Thank you very much.
Upvotes: 0
Views: 570
Reputation: 127
My understanding of the question is that you either want to be able to do a groupby operation over two variables simulataneously, or groupby something that is not a method of the xarray DateTimeAccessor.
Something you might look at is using xarray.apply_ufunc
. Below is some code that I used for getting grouped means by year and by month.
def _grouped_mean(
data: np.ndarray,
months: np.ndarray,
years: np.ndarray) -> np.ndarray:
"""similar to grouping year_month MultiIndex, but faster.
Should be used wrapped by _wrapped_grouped_mean"""
unique_months = np.sort(np.unique(months))
unique_years = np.sort(np.unique(years))
old_shape = list(data.shape)
new_shape = old_shape[:-1]
new_shape.append(unique_months.shape[0])
new_shape.append(unique_years.shape[0])
output = np.zeros(new_shape)
for i_month, j_year in np.ndindex(output.shape[2:]):
indices = np.intersect1d(
(months == unique_months[i_month]).nonzero(),
(years == unique_years[j_year]).nonzero()
)
output[:, :, i_month, j_year] =\
np.mean(data[:, :, indices], axis=-1)
return output
def _wrapped_grouped_mean(da: xr.DataArray) -> xr.DataArray:
"""similar to grouping by a year_month MultiIndex, but faster.
Wraps a numpy-style function with xr.apply_ufunc
"""
Y = xr.apply_ufunc(
_grouped_mean,
da,
da.time.dt.month,
da.time.dt.year,
input_core_dims=[['lat', 'lon', 'time'], ['time'], ['time']],
output_core_dims=[['lat', 'lon', 'month', 'year']],
)
Y = Y.assign_coords(
{'month': np.sort(np.unique(da.time.dt.month)),
'year': np.sort(np.unique(da.time.dt.year))})
return Y
Upvotes: 0
Reputation: 121
Sorry, probably my question was not clear. Consider only the quantiles. My expected output is something like that:
<xarray.DataArray (hours: 1464, quantile: 3)>
array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
...,
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
Coordinates:
* quantile (quantile) float64 0.25 0.5 0.75
* hours (hours) int64 6 12 18 24 30 36 42 48 54 60 66 72 ...
Where hours are the hours from the beginning of the year. But instead of hours it could be good also something like a multiindex with dayofyear and hour (of day). I've got a tricky way to do it (Performing some rendexing With multindexing and unstack the time dimension), but it's really horrible. I think that there is easier and elegant way to do it.
Thank you very much.
Upvotes: 0
Reputation: 6434
If you want daily averages, resample
is the best tool for the job:
daily = arr.resample(time='D').mean('time')
Then, you can use groupby to calculate quantiles for each day of year:
quantiles_by_dayofyear = daily.groupby('time.dayofyear').apply(
xr.DataArray.quantile, q=[0.25, 0.5, 0.75])
print(quantiles_by_dayofyear)
Yields:
<xarray.DataArray (dayofyear: 366, quantile: 3)>
array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
...,
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
Coordinates:
* quantile (quantile) float64 0.25 0.5 0.75
* dayofyear (dayofyear) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ...
We should probably add the quantile method to xarray's list of groupby reduce methods but this should work for now.
Upvotes: 2
Reputation: 13
For the daily average I would suggest to use the resample function. In case I understood the question correctly, this should give you daily averages. You can then use these daily averages to compute your groupby dayofyear operation.
import numpy as np
import pandas as pd
import xarray as xr
time = pd.date_range('2000-01-01', '2010-01-01', freq='6h')
arr = xr.DataArray(
np.ones(len(time)),
dims='time',
coords={'time' : ('time', time)}
)
daily = arr.resample(time='D').mean('time')
Upvotes: 0