Reputation: 225
I want to use Pandas and Python to iterate through my .csv file and group the data by seasons calculating the mean for each season in the year. Currently the quarterly script does Jan-Mar, Apr-Jun etc. I want the seasons correlate to months by - 11: 'Winter', 12: 'Winter', 1: 'Winter', 2: 'Spring', 3: 'Spring', 4: 'Spring', 5: 'Summer', 6: 'Summer', 7: 'Summer', \ 8: 'Autumn', 9: 'Autumn', 10: 'Autumn'
I have the following data:
Date,HAD
01/01/1951,1
02/01/1951,-0.13161201
03/01/1951,-0.271796132
04/01/1951,-0.258977158
05/01/1951,-0.198823057
06/01/1951,0.167794502
07/01/1951,0.046093808
08/01/1951,-0.122396694
09/01/1951,-0.121824587
10/01/1951,-0.013002463
This is my code so far:
# Iterate through a list of files in a folder looking for .csv files
for csvfilename in glob.glob("C:/Users/n-jones/testdir/output/*.csv"):
# Allocate a new file name for each file and create a new .csv file
csvfilenameonly = "RBI-Seasons-Year" + path_leaf(csvfilename)
with open("C:/Users/n-jones/testdir/season/" + csvfilenameonly, "wb") as outfile:
# Open the input csv file and allow the script to read it
with open(csvfilename, "rb") as infile:
# Create a pandas dataframe to summarise the data
df = pd.read_csv(infile, parse_dates=[0], index_col=[0], dayfirst=True)
mean = df.resample('Q-SEP', how='mean')
# Output to new csv file
mean.to_csv(outfile)
I hope this makes some sense.
Thank you in advance!
Upvotes: 3
Views: 1867
Reputation: 1413
It looks like you just need a dict lookup and a groupby. The code below should work.
import pandas as pd
import os
import re
lookup = {
11: 'Winter',
12: 'Winter',
1: 'Winter',
2: 'Spring',
3: 'Spring',
4: 'Spring',
5: 'Summer',
6: 'Summer',
7: 'Summer',
8: 'Autumn',
9: 'Autumn',
10: 'Autumn'
}
os.chdir('C:/Users/n-jones/testdir/output/')
for fname in os.listdir('.'):
if re.match(".*csv$", fname):
data = pd.read_csv(fname, parse_dates=[0], dayfirst=True)
data['Season'] = data['Date'].apply(lambda x: lookup[x.month])
data['count'] = 1
data = data.groupby(['Season'])['HAD', 'count'].sum()
data['mean'] = data['HAD'] / data['count']
data.to_csv('C:/Users/n-jones/testdir/season/' + fname)
Upvotes: 1