Stavros Anastasiadis
Stavros Anastasiadis

Reputation: 413

Pandas, groupby and summing over specific months

I have a DataFrame :

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 982 entries, 2009-10-30 00:00:00 to 2012-12-16 00:00:00
Data columns (total 4 columns):
rain        981  non-null values
temp_max    982  non-null values
temp_min    982  non-null values
temp        982  non-null values
dtypes: float64(4)

For summing per Year/Month i use :

mdata = data.groupby([lambda x: x.year, lambda x: x.month]).agg([sum])

But i need Seasonal analysis (summer, winter etc), so how i can create the Sum of specific months like [1 ,2 ,3] of each year?

Ty

Upvotes: 3

Views: 4939

Answers (1)

Woody Pride
Woody Pride

Reputation: 13955

Yes, one solution which seems neat to me is to use a Seasons dictionary and then group the data using a function. Any function passed as a group key is called once per index value and the return values are used as the group names.

import pandas as pd
import numpy as np
from pandas import DataFrame
import datetime
# Create a year's worth of data
base = datetime.date.today() - datetime.timedelta(365)
Datelist = [base + datetime.timedelta(days = x) for x in range(365)]
DF = DataFrame(np.random.rand(365), index = Datelist)

# Create a Seasonal Dictionary that will map months to seasons
SeasonDict = {11: 'Winter', 12: 'Winter', 1: 'Winter', 2: 'Spring', 3: 'Spring', 4: 'Spring', 5: 'Summer', 6: 'Summer', 7: 'Summer', \
8: 'Autumn', 9: 'Autumn', 10: 'Autumn'}

# Write a function that will be used to group the data
def GroupFunc(x):
    return SeasonDict[x.month]

# Call the function with the groupby operation. 
Grouped = DF.groupby(GroupFunc)
Grouped.sum()

The function takes each index value and looks up the month in the Seasons Dictionary and returns the value corresponding to the month key. This value then becomes the group name.

Alternatively you can use the lambda as in your example (which is more efficient, but I thought the above would be easier to understand):

DF.groupby(lambda x: SeasonDict[x.month]).sum()

ADDITIONAL CODE AS PER COMMENTS It seems to me like you would be better off slicing the data. So you could do the following

DF['Season'] = ""
for row in DF.index:
    DF.Season[row] = SeasonDict[row.month]
DFWinter = DF[DF.Season == 'Winter']

Now you have a new data frame with the winter data in, to play with as you desire. The difference is that the groupby operations allow you to undertake the same operations on all the data, whereas it sounds like you wanted to investigate the properties of different parts of your data set in different ways. To do that its better to slice, in this case using Boolean slicing.

Upvotes: 4

Related Questions