alextc
alextc

Reputation: 3515

xarray - how to group or resample time series data by yyyy-01-01 and yyyy-07-01 over multiple years

My time serries data is a xarray' DataArray object called da_output_halfyearly:

<xarray.DataArray '__xarray_dataarray_variable__' (time: 10, latitude: 106, longitude: 193)>
dask.array<shape=(4, 106, 193), dtype=int32, chunksize=(2, 106, 193)>
Coordinates:
  * latitude   (latitude) float32 -39.2 -39.149525 ... -33.950478 -33.9
  * longitude  (longitude) float32 140.8 140.84792 140.89584 ... 149.95209 150.0
  * time       (time) datetime64[ns] 1972-01-01 1972-07-01 1973-01-01 1973-07-01 ... 1981-01-01 1981-07-01

I will need to group/resample the data into two time groups "yyyy-01-01" and "yyyy-07-01" and take std() off the data in each group.

I was able to use index selecting to split the data into two separate DataArray objects:

da_all_jan_jun = da_output_halfyearly[::2]
da_all_jul_dec = da_output_halfyearly[1::2]

da_jan_jun_std = da_all_jan_jun.std(dim='time')
da_jul_dec_std = da_all_jul_dec.std(dim='time')

However, the output DataArray objects lost the time dimension.

Upvotes: 0

Views: 646

Answers (1)

spencerkclark
spencerkclark

Reputation: 2097

Let's say you are starting from the following setup:

import pandas as pd
import xarray as xr

times = pd.date_range('2000', periods=100, freq='M')
da = xr.DataArray(range(len(times)), [('time', times)])
resampled = da.resample(time='6MS', closed='left').sum('time')

A quick way to achieve something close to your desired result is to use groupby, grouping by the month of the year:

result = resampled.groupby('time.month').std('time')

This will leave you with a resulting DataArray that has a 'month' dimension, with values of either 1 or 7:

<xarray.DataArray (month: 2)>
array([160.269218, 164.972725])
Coordinates:
  * month    (month) int64 1 7

If you want labels that are a little more descriptive, you could construct a DataArray to use for grouping, e.g.

jan_jun = xr.full_like(resampled.time, 'jan-jun', dtype='<U7')
jul_dec = xr.full_like(resampled.time, 'jul-dec', dtype='<U7')
labels = xr.where(resampled.time.dt.month == 1, jan_jun, jul_dec)
labels = labels.rename('time')
result = resampled.groupby(labels).std('time')

In this case, the result looks like:

<xarray.DataArray (time: 2)>
array([160.269218, 164.972725])
Coordinates:
  * time   (time) object 'jan-jun' 'jul-dec'

Upvotes: 1

Related Questions