Bridget
Bridget

Reputation: 39

How to sort Months in Monthly order within Python Pandas when working with .CSV files?

For each NAME/LOCATION, calculate the average snow amount per month. Save the results in two separate .csv files (one for 2016 and the other for 2017) name the files average2016.csv and average2017.csv. I am used Python 3.8 with Panadas. I accomplished doing this with this code:

import numpy as np
import pandas as pd

df = pd.read_csv('filteredData.csv')

df['DATE'] = pd.to_datetime(df['DATE'])
df['year'] = pd.DatetimeIndex(df['DATE']).year


df16 = df[(df.year == 2016)]
df17 = df[(df.year == 2017)]

df_2016 = df16.groupby(['NAME', 'Month'])['SNOW'].mean().reset_index()
df_2017 = df17.groupby(['NAME', 'Month'])['SNOW'].mean().reset_index()

df_2016[['NAME', 'Month', 'SNOW']].to_csv('average2016.csv')
df_2017[['NAME', 'Month', 'SNOW']].to_csv('average2017.csv')

This image shows my results for average 2016.

A small portion of my average2016.csv file

However, the problem that I am having is that the Months are not in Monthly order. I want them to go from January through December for each location. Example: I want the NAME: ADA 0.7 SE, MI US Month's to be May then June. How would I be able to accomplish this? Also is there a way to get rid of the first numbered column?

Upvotes: 1

Views: 543

Answers (1)

ALollz
ALollz

Reputation: 59579

You can sort on the DATE column. But then you need to remember to do sort=False in your groupby, else it will sort there using the string ordering. In addition, your repetitive code for each year can be replaced with a single groupby, adding year to the grouping keys. Then you'd then write separately into different files and index=False is how you get rid of the Index.

import numpy as np
import pandas as pd

df = pd.read_csv('filteredData.csv')

df['DATE'] = pd.to_datetime(df['DATE'])
df['year'] = df['DATE'].dt.year         # Datetime has this attribute already 

df = df.sort_values(['NAME', 'DATE'])   # Output will be in order within each Name

df = (df[df.year.between(2016,2017)]    # Only 2016 and 2017
        .groupby(['year', 'NAME', 'Month'], sort=False)['SNOW']
        .mean().reset_index())

for year,gp in df.groupby('year'):      # Write files separately by year
    gp[['NAME', 'Month', 'SNOW']].to_csv(f'average{year}.csv', index=False)

Upvotes: 1

Related Questions