Pandas TimeGrouper: Boundaries for the grouping

Question

I am currently grouping my data by time using

df.groupby(pd.TimeGrouper('AS'))

which gives me annual groups. However, I would like these groups to start at March, to be precise xxxx-03-01 for every year.

One way to enforce this would be to ensure that my first data point is on A March first, or that my last data point ends on February 28th and use closed='right'. None of these are feasible for me at the moment. How else could I group annually, from March to March?

FooBar · Accepted Answer

Inspired by @cphlewis , here is my groupBy method that groups yearly, but starts at a given month:

rng = pd.date_range('1/1/2011', periods=25, freq='M')
ts = pd.DataFrame(np.random.randn(len(rng)), index=rng, columns=['ts'])

def groupByYearMonth(ts, month):
    starts = ts[ts.index.month==month].index  # Fix if multiple entries per month.

    if starts[0] > ts.index[0]:
        ts.loc[ts.index < starts[0], 'group'] = starts[0].year - 1
    for start in starts:
        end = '%d-%d'%(start.year+1, start.month-1)
        ts.loc[start:end, 'group'] = start.year
    return ts.groupby('group')

groupBy = groupByYearMonth(ts, 3)
print groupBy.mean(), groupBy.size()
             ts
group          
2010   0.638609
2011  -0.124718
2012   0.385539 group
2010      2
2011     12
2012     11
dtype: int64

Pandas TimeGrouper: Boundaries for the grouping

Answers (2)

Related Questions