Group data by seasons using python and pandas

Question

I want to use Pandas and Python to iterate through my .csv file and group the data by seasons calculating the mean for each season in the year. Currently the quarterly script does Jan-Mar, Apr-Jun etc. I want the seasons correlate to months by - 11: 'Winter', 12: 'Winter', 1: 'Winter', 2: 'Spring', 3: 'Spring', 4: 'Spring', 5: 'Summer', 6: 'Summer', 7: 'Summer', \ 8: 'Autumn', 9: 'Autumn', 10: 'Autumn'

I have the following data:

Date,HAD
01/01/1951,1
02/01/1951,-0.13161201
03/01/1951,-0.271796132
04/01/1951,-0.258977158
05/01/1951,-0.198823057
06/01/1951,0.167794502
07/01/1951,0.046093808
08/01/1951,-0.122396694
09/01/1951,-0.121824587
10/01/1951,-0.013002463

This is my code so far:

# Iterate through a list of files in a folder looking for .csv files
for csvfilename in glob.glob("C:/Users/n-jones/testdir/output/*.csv"):

# Allocate a new file name for each file and create a new .csv file
    csvfilenameonly = "RBI-Seasons-Year" + path_leaf(csvfilename) 
    with open("C:/Users/n-jones/testdir/season/" + csvfilenameonly, "wb") as outfile:

        # Open the input csv file and allow the script to read it
        with open(csvfilename, "rb") as infile:

            # Create a pandas dataframe to summarise the data
            df = pd.read_csv(infile, parse_dates=[0], index_col=[0], dayfirst=True)

            mean = df.resample('Q-SEP', how='mean')

            # Output to new csv file
            mean.to_csv(outfile)

I hope this makes some sense.

Thank you in advance!

Yeqing Zhang · Accepted Answer

It looks like you just need a dict lookup and a groupby. The code below should work.

import pandas as pd
import os
import re

lookup = {
    11: 'Winter',
    12: 'Winter',
    1: 'Winter',
    2: 'Spring',
    3: 'Spring',
    4: 'Spring',
    5: 'Summer',
    6: 'Summer',
    7: 'Summer',
    8: 'Autumn',
    9: 'Autumn',
    10: 'Autumn'
}

os.chdir('C:/Users/n-jones/testdir/output/')

for fname in os.listdir('.'):
    if re.match(".*csv$", fname):
        data = pd.read_csv(fname, parse_dates=[0], dayfirst=True)
        data['Season'] = data['Date'].apply(lambda x: lookup[x.month])
        data['count'] = 1
        data = data.groupby(['Season'])['HAD', 'count'].sum()
        data['mean'] = data['HAD'] / data['count']
        data.to_csv('C:/Users/n-jones/testdir/season/' + fname)

Group data by seasons using python and pandas

Answers (1)

Related Questions