quantile method on groupby of xarray dataset

Question

I have a classic xarray Dataset. These are monthly data (38 years of monthly data).

I am interested in calculating the quantile values for each month separately.


Dimensions:        (lat: 26, lon: 71, time: 456)
Coordinates:
  * lat            (lat) float32 25.0 26.0 27.0 28.0 29.0 30.0 31.0 32.0 ...
  * lon            (lon) float32 -130.0 -129.0 -128.0 -127.0 -126.0 -125.0 ...
  * time           (time) datetime64[ns] 1979-01-31 1979-02-28 1979-03-31 ...
Data variables:
    var1         (time, lat, lon) float32 nan nan nan nan nan nan nan nan ...
    var2         (time, lat, lon) float32 nan nan nan nan nan nan nan nan ...
    var3         (time, lat, lon) float32 nan nan nan nan nan nan nan nan ...
    ......

For example, if I want the mean for each month I use:

ds.groupby(‘time.month’).mean(dim=‘time’)

But if I try

ds.groupby(‘time.month’).quantile(0.75, dim=‘time’)

I get

AttributeError: 'DatasetGroupBy' object has no attribute 'quantile'

however, based on Pandas documentation, quantile works on groupby object.

In fact, I tried the following:

df_ds = xr.Dataset.to_dataframe(ds)
df_ds = df_ds.reset_index()
df_ds = df_ds.set_index('time')
df_ds.groupby(pd.TimeGrouper(freq='M')).quantile(0.75)

and it works; of course this is a much simpler example because I have only one index, and indeed if I don't reset_index/set_index to one index only I get an error from pandas that it cannot handle multiindex.

So, can xarray do it? perhaps using some apply/lambda combination?

I found a very non elegant way to go around it. It is feasible because I have 4 variables (and I could look through the variable names, but I don't here):

Data_clim_monthly_75g = ds.where(iok_conus_xarray).groupby('time.month')
Data_clim_monthly_75 = ds.where(iok_conus_xarray).groupby('time.month').mean(dim='time')

v1 = Data_clim_monthly_75['var1'].values
v2 = Data_clim_monthly_75['var2'].values
v3 = Data_clim_monthly_75['var3'].values
v4 = Data_clim_monthly_75['var4'].values
for k, gp in Data_clim_monthly_75g:
    v1[k-1] =  np.nanpercentile(gp['var1'].values,q=75,axis=0)
    v2[k-1] =  np.nanpercentile(gp['var2'].values,q=75,axis=0)
    v3[k-1] =  np.nanpercentile(gp['var3'].values,q=75,axis=0)
    v4[k-1] =  np.nanpercentile(gp['var4'].values,q=75,axis=0)
Data_clim_monthly_75['var1'] = (('month','lat','lon'),v1)    
Data_clim_monthly_75['var2'] = (('month','lat','lon'),v2)    
Data_clim_monthly_75['var3'] = (('month','lat','lon'),v3)    
Data_clim_monthly_75['var4'] = (('month','lat','lon'),v4)

I basically work around xarray. I still would love a solution within xarray.

jhamman · Accepted Answer

We have not added the quantile method to the groupby object yet. You can however apply arbitrary reduce functions to each group using the reduce method. In my example below, I apply np.nanpercentile to each group.

In [21]: ds
Out[21]:

Dimensions:  (lat: 71, lon: 26, time: 456)
Coordinates:
  * time     (time) datetime64[ns] 1979-01-31 1979-02-28 1979-03-31 ...
Dimensions without coordinates: lat, lon
Data variables:
    var1     (time, lon, lat) float64 0.4286 0.4032 0.2178 0.7652 0.8108 ...
    var2     (time, lon, lat) float64 0.8259 0.3625 0.6556 0.7403 0.2381 ...

In [22]: ds.groupby('time.month').reduce(np.nanpercentile, dim='time', q=0.75)
Out[22]:

Dimensions:  (lat: 71, lon: 26, month: 12)
Coordinates:
  * month    (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
Dimensions without coordinates: lat, lon
Data variables:
    var1     (month, lon, lat) float64 0.04153 0.03099 0.07881 0.01749 ...
    var2     (month, lon, lat) float64 0.03518 0.06896 0.01287 0.025 0.01536 ...

Edit: from xarray version 0.12.2 onwards GroupBy objects do have the GroupBy.quantile method you were looking for:

ds.groupby(‘time.month’).quantile(q=0.75, dim=‘time’)

quantile method on groupby of xarray dataset

Answers (1)

Related Questions