newbie
newbie

Reputation: 407

using groupby on pandas dataframe to group by financial year

I have a dataframe with a datetime64 column called DT. Is it possible to use groupby to group by financial year from April 1 to March 31?

For example,

    Date | PE_LOW 
    2010-04-01 | 15.44
    ...
    2011-03-31 | 16.8
    2011-04-02 | 17.
    ...
    2012-03-31 | 17.4

For the above data, I want to group by Fiscal Year 2010-2011 and Fiscal Year 2011-2012 without creating an extra column.*

Upvotes: 5

Views: 3508

Answers (5)

Galen
Galen

Reputation: 1352

Similar to this answer, but I would (at this time of this initial post) need to report that the fiscal year is 2023. This is acheived by reversing the inequality and changing the decrement to an increment.

def fiscal_year(dt):
    year = dt.year
    if dt.month > 4:
        year += 1
    return year

Upvotes: 0

s_pike
s_pike

Reputation: 2113

The simplest method I've found for this (similar to Alex's answer, but slightly more concise):

df.groupby([pd.Grouper(key='DateColumn', freq="A-MAR")])

If you want year finishing on the last working day you can use freq="BA-MAR"

Upvotes: 0

Alex
Alex

Reputation: 11

I had a similar problem and used the following to offset the business year end to March (month=3) using Grouper and specifying the frequency:

grouped_df = df.groupby([pd.Grouper(key='DateColumn', freq=pd.tseries.offsets.BYearEnd(month=3))])

Pandas Business Year End and Grouper

Upvotes: 1

Jihun
Jihun

Reputation: 1485

With pandas.DatetimeIndex, that is very simple:

DT.groupby(pd.DatetimeIndex(DT.Date).shift(-3,freq='m').year)

Or if you use Date as an index of DT, it is even simpler:

DT.groupby(DT.index.shift(-3,freq='m').year)

But beware that shift(-3,freq='m') shifts date to ends of months; for example, 8 Apr to 31 Jan and so on. Anyway, it fits your problem well.

Upvotes: 3

ZJS
ZJS

Reputation: 4051

The first thing you want to do is define a function that outputs the financial year as a value. You could use the following.

def getFiscalYear(dt):
    year = dt.year
    if dt.month<4: year -= 1
    return year

You say you don't want to use an extra column to group the frame. Typically the groupby method is called by saying something like this df.groupby("colname") however that statement is semantically equivalent to df.groupby(df["colname"] - meaning you can do something like this...

grouped = DT.groupby(DT['Date'].apply(getFiscalYear))

and then apply a method to the groups or whatever you want to do. If you just want these groups separated call grouped.groups

Upvotes: 4

Related Questions