Reputation: 16488
I am currently grouping my data by time using
df.groupby(pd.TimeGrouper('AS'))
which gives me annual groups. However, I would like these groups to start at March, to be precise xxxx-03-01
for every year.
One way to enforce this would be to ensure that my first data point is on A March first, or that my last data point ends on February 28th and use closed='right'
. None of these are feasible for me at the moment. How else could I group annually, from March to March?
Upvotes: 1
Views: 685
Reputation: 16488
Inspired by @cphlewis , here is my groupBy method that groups yearly, but starts at a given month:
rng = pd.date_range('1/1/2011', periods=25, freq='M')
ts = pd.DataFrame(np.random.randn(len(rng)), index=rng, columns=['ts'])
def groupByYearMonth(ts, month):
starts = ts[ts.index.month==month].index # Fix if multiple entries per month.
if starts[0] > ts.index[0]:
ts.loc[ts.index < starts[0], 'group'] = starts[0].year - 1
for start in starts:
end = '%d-%d'%(start.year+1, start.month-1)
ts.loc[start:end, 'group'] = start.year
return ts.groupby('group')
groupBy = groupByYearMonth(ts, 3)
print groupBy.mean(), groupBy.size()
ts
group
2010 0.638609
2011 -0.124718
2012 0.385539 group
2010 2
2011 12
2012 11
dtype: int64
Upvotes: 0
Reputation: 16249
Inelegant, but I don't see that groupby has such an argument built in:
import pandas as pd
from numpy.random import randn
rng = pd.date_range('1/1/2011', periods=25, freq='M')
ts = pd.Series(randn(len(rng)), index=rng)
def truncYears(ts, month):
starts = ts[ts.index.month==month].index # Fix if multiple entries per month.
groups = {}
if starts[0] > ts.index[0]:
groups[ts.index[0]] = ts[ts.index < starts[0]]
for start in starts:
end = '%d-%d'%(start.year+1, start.month-1)
print(start, end)
groups[start] = ts[start:end]
return groups
groups = truncYears(ts, 3)
for k in groups:
print(groups[k])
result (note dict keys not sorted, so years not in order):
2011-01-31 -1.719806 2011-02-28 -0.657064 Freq: M, dtype: float64 2012-03-31 1.200984 2012-04-30 -0.496715 2012-05-31 -0.998218 2012-06-30 1.711504 2012-07-31 0.304211 2012-08-31 1.091810 2012-09-30 -0.716785 2012-10-31 -0.996493 2012-11-30 -0.541812 2012-12-31 1.027787 2013-01-31 0.249775 Freq: M, dtype: float64 2011-03-31 -1.406736 2011-04-30 0.245077 2011-05-31 -0.010090 2011-06-30 -1.459824 2011-07-31 0.150871 2011-08-31 -1.223533 2011-09-30 0.859539 2011-10-31 0.623674 2011-11-30 -2.071204 2011-12-31 0.254750 2012-01-31 0.667076 2012-02-29 0.076249 Freq: M, dtype: float64
Upvotes: 2