Philip Blankenau
Philip Blankenau

Reputation: 171

Python Pandas: Creating seasonal DateOffset object?

I have a datetime indexed dataframe with an hourly frequency. I would like to produce a groupby object - grouping by the season. By season I mean spring is months 3, 4, 5, summer is 6, 7, 8, and so on. I would like to have a unique group for each year-season combination. Is there a way to do this with a custom DateOffset? Would it require a subclass to do it? Or am I better off just producing a season column and then do: grouper = df.groupby([df['season'], df.index.year]).

Current code is ugly:

def group_season(df):
    """
    This uses the meteorological seasons
    """
    df['month'] = df.index.month
    spring = df['month'].isin([3,4,5])
    spring[spring] = 'spring'
    summer = df['month'].isin([6,7,8])
    summer[summer] = 'summer'
    fall = df['month'].isin([9,10,11])
    fall[fall] = 'fall'
    winter = df['month'].isin([12,1,2])
    winter[winter] = 'winter'
    df['season'] = pd.concat([winter[winter != False], spring[spring != False],\
    fall[fall != False], summer[summer != False]], axis=0)

    return df.groupby([df['season'], df.index.year])

Upvotes: 2

Views: 1339

Answers (1)

Alicia Garcia-Raboso
Alicia Garcia-Raboso

Reputation: 13913

For the kind of grouping you want to do, use anchored quarterly offsets.

import numpy as np
import pandas as pd

dates = pd.date_range('2016-01', freq='MS', periods=12)
df = pd.DataFrame({'num': np.arange(12)}, index=dates)
print(df)

#             num
# 2016-01-01    0
# 2016-02-01    1
# 2016-03-01    2
# 2016-04-01    3
# 2016-05-01    4
# 2016-06-01    5
# 2016-07-01    6
# 2016-08-01    7
# 2016-09-01    8
# 2016-10-01    9
# 2016-11-01   10
# 2016-12-01   11

by_season = df.resample('QS-MAR').sum()
print(by_season)

#             num
# 2015-12-01    1
# 2016-03-01    9
# 2016-06-01   18
# 2016-09-01   27
# 2016-12-01   11

You can also make nicer, more descriptive labels in the index:

SEASONS = {
    'winter': [12, 1, 2],
    'spring': [3, 4, 5],
    'summer': [6, 7, 8],
    'fall': [9, 10, 11]
}
MONTHS = {month: season for season in SEASONS.keys()
                        for month in SEASONS[season]}

by_season.index = (pd.Series(by_season.index.month).map(MONTHS) +
                   ' ' + by_season.index.year.astype(str))
print(by_season)

#              num
# winter 2015    1
# spring 2016    9
# summer 2016   18
# fall 2016     27
# winter 2016   11

Upvotes: 4

Related Questions