alextc
alextc

Reputation: 3515

How to call the xarray's groupby function to group data by a combination of year and month

I have a DataArray object for a daily dataset that spans over a few years. This has one variable and three dimensions named latitude, longitude and time (daily). The time coordinates are like time (time) datetime64[ns] 2016-01-01 2016-01-02 ... 2018-12-31

I would like to group the data by a combination of year and month by the DataArray's groupby function. But the following code only gives me the time coordinates in int64, saying 1, 2, 3, ..., 12.

da_groupby_monthly = da.groupby('time.month').sum('time')
print(da_groupby_monthly)

Output:

<xarray.DataArray (month: 12, latitude: 106, longitude: 193)>
dask.array<shape=(12, 106, 193), dtype=int32, chunksize=(1, 106, 193)>
Coordinates:
  * latitude   (latitude) float32 -39.2 -39.149525 ... -33.950478 -33.9
  * longitude  (longitude) float32 140.8 140.84792 140.89584 ... 149.95209 150.0
  * month      (month) int64 1 2 3 4 5 6 7 8 9 10 11 12

How to keep the data type of time datetime64[ns] and make the month coordinates to be something like "2016-01", "2016-02", "2016-03", ... ..., "2018-12", and so on.

Upvotes: 4

Views: 10166

Answers (2)

echarliewhite
echarliewhite

Reputation: 61

To do an xarray groupby operation on multiple variables (e.g. year and month) more generally, you can combine variables in a pandas MultiIndex, make it a non-dimension coordinate, and pass it to groupby:

import pandas as pd
year_month_idx = pd.MultiIndex.from_arrays([da['time.year'], da['time.month']])
da.coords['year_month'] = ('time', year_month_idx)
da_monthly = da.groupby('year_month').sum()

You can also create a MultiIndex for use with groupby by stacking coordinates. For example, given a set of latitude/longitude coordinates, you can groupby all unique lat-lon locations:

da_stacked = da.stack(latlon=['lat','lon'])
da_stacked.groupby('latlon').sum()

Upvotes: 6

Dallas Lindauer
Dallas Lindauer

Reputation: 239

I like using the resample method. Try this:

da_monthly  = da.resample('1m', on='time').sum()

Upvotes: 6

Related Questions