ma8
ma8

Reputation: 123

Pandas define a seasonal year from June 1 - July 30 instead of Jan 1 - Dec 31

I have seasonal snow data which I want to group by snow year (July 1, 1954 - June 30, 1955) rather than having one winter's data split over two years (January 1, 1954 - December 31, 1954 and January 1, 1955 - Dec 31, 1955.)

example data

I modified the code from this question:

Using pandas to select specific seasons from a dataframe whose values are over a defined threshold (thanks Pad)

def get_season(row):
  if row['date'].month <= 7:
      return row['date'].year
  else:
      return row['date'].year + 1

df['Seasonal_Year'] = df.apply(get_season, axis=1)

results of method call

Is there a better way to do this than I have done?

Upvotes: 4

Views: 593

Answers (2)

piRSquared
piRSquared

Reputation: 294358

you can use pd.offsets.MonthBegin

Consider the dataframe of dates df

df = pd.DataFrame(dict(Date=pd.date_range('2010-01-30', periods=24, freq='M')))

We can offset the Date and grab the year

df.assign(Season=(df.Date - pd.offsets.MonthBegin(7)).dt.year + 1)

         Date  Season
0  2010-01-31    2010
1  2010-02-28    2010
2  2010-03-31    2010
3  2010-04-30    2010
4  2010-05-31    2010
5  2010-06-30    2010
6  2010-07-31    2011
7  2010-08-31    2011
8  2010-09-30    2011
9  2010-10-31    2011
10 2010-11-30    2011
11 2010-12-31    2011
12 2011-01-31    2011
13 2011-02-28    2011
14 2011-03-31    2011
15 2011-04-30    2011
16 2011-05-31    2011
17 2011-06-30    2011
18 2011-07-31    2012
19 2011-08-31    2012
20 2011-09-30    2012
21 2011-10-31    2012
22 2011-11-30    2012
23 2011-12-31    2012

Upvotes: 3

jezrael
jezrael

Reputation: 862851

I think yes, with numpy.where:

years = df['date'].dt.year
df['Seasonal_Year'] = np.where(df['date'].dt.month <= 7, years, years + 1)

Upvotes: 3

Related Questions