Reputation: 123
I have seasonal snow data which I want to group by snow year (July 1, 1954 - June 30, 1955) rather than having one winter's data split over two years (January 1, 1954 - December 31, 1954 and January 1, 1955 - Dec 31, 1955.)
I modified the code from this question:
Using pandas to select specific seasons from a dataframe whose values are over a defined threshold (thanks Pad)
def get_season(row):
if row['date'].month <= 7:
return row['date'].year
else:
return row['date'].year + 1
df['Seasonal_Year'] = df.apply(get_season, axis=1)
Is there a better way to do this than I have done?
Upvotes: 4
Views: 593
Reputation: 294358
you can use pd.offsets.MonthBegin
Consider the dataframe of dates df
df = pd.DataFrame(dict(Date=pd.date_range('2010-01-30', periods=24, freq='M')))
We can offset the Date and grab the year
df.assign(Season=(df.Date - pd.offsets.MonthBegin(7)).dt.year + 1)
Date Season
0 2010-01-31 2010
1 2010-02-28 2010
2 2010-03-31 2010
3 2010-04-30 2010
4 2010-05-31 2010
5 2010-06-30 2010
6 2010-07-31 2011
7 2010-08-31 2011
8 2010-09-30 2011
9 2010-10-31 2011
10 2010-11-30 2011
11 2010-12-31 2011
12 2011-01-31 2011
13 2011-02-28 2011
14 2011-03-31 2011
15 2011-04-30 2011
16 2011-05-31 2011
17 2011-06-30 2011
18 2011-07-31 2012
19 2011-08-31 2012
20 2011-09-30 2012
21 2011-10-31 2012
22 2011-11-30 2012
23 2011-12-31 2012
Upvotes: 3
Reputation: 862851
I think yes, with numpy.where
:
years = df['date'].dt.year
df['Seasonal_Year'] = np.where(df['date'].dt.month <= 7, years, years + 1)
Upvotes: 3