user3017048
user3017048

Reputation: 2911

Dataset statistics with custom begin of the year

I would like to do some annual statistics (cumulative sum) on an daily time series of data in an xarray dataset. The tricky part is that the day on which my considered year begins must be flexible and the time series contains leap years.

I tried e.g. the following:

rollday = -181
dr = pd.date_range('2015-01-01', '2017-08-23')
foo = xr.Dataset({'data': (['time'], np.ones(len(dr)))}, coords={'time': dr})
foo_groups = foo.roll(time=rollday).groupby(foo.time.dt.year)
foo_cumsum = foo_groups.apply(lambda x: x.cumsum(dim='time', skipna=True))

which is "unfavorable" mainly because of two things: (1) the rolling doesn't account for the leap years, so the get an offset of one day per leap year and (2) the beginning of the first year (until end of June) is appended to the end of the rolled time series, which creates some "fake year" where the cumulative sums doesn't make sense anymore.

I tried also to first cut off the ends of the time series, but then the rolling doesn't work anymore. Resampling to me also did not seem to be an option, as I could not find a fitting pandas freq string.

I'm sure there is a better/correct way to do this. Can somebody help?

Upvotes: 0

Views: 331

Answers (1)

jhamman
jhamman

Reputation: 6434

You can use a xarray.DataArray that specifies the groups. One way to do this is to create an array of values (years) that define the group ids:

# setup sample data
dr = pd.date_range('2015-01-01', '2017-08-23')
foo = xr.Dataset({'data': (['time'], np.ones(len(dr)))}, coords={'time': dr})

# create an array of years (modify day/month for your use case)
my_years = xr.DataArray([t.year if ((t.month < 9) or ((t.month==9) and (t.day < 15))) else (t.year + 1) for t in foo.indexes['time']],
                        dims='time', name='my_years', coords={'time': dr})

# use that array of years (integers) to do the groupby
foo_cumsum = foo.groupby(my_years).apply(lambda x: x.cumsum(dim='time', skipna=True))

# Voila!
foo_cumsum['data'].plot()

enter image description here

Upvotes: 2

Related Questions