Reputation: 39
For each NAME/LOCATION, calculate the average snow amount per month. Save the results in two separate .csv files (one for 2016 and the other for 2017) name the files average2016.csv and average2017.csv. I am used Python 3.8 with Panadas. I accomplished doing this with this code:
import numpy as np
import pandas as pd
df = pd.read_csv('filteredData.csv')
df['DATE'] = pd.to_datetime(df['DATE'])
df['year'] = pd.DatetimeIndex(df['DATE']).year
df16 = df[(df.year == 2016)]
df17 = df[(df.year == 2017)]
df_2016 = df16.groupby(['NAME', 'Month'])['SNOW'].mean().reset_index()
df_2017 = df17.groupby(['NAME', 'Month'])['SNOW'].mean().reset_index()
df_2016[['NAME', 'Month', 'SNOW']].to_csv('average2016.csv')
df_2017[['NAME', 'Month', 'SNOW']].to_csv('average2017.csv')
This image shows my results for average 2016.
However, the problem that I am having is that the Months are not in Monthly order. I want them to go from January through December for each location. Example: I want the NAME: ADA 0.7 SE, MI US Month's to be May then June. How would I be able to accomplish this? Also is there a way to get rid of the first numbered column?
Upvotes: 1
Views: 543
Reputation: 59579
You can sort
on the DATE column. But then you need to remember to do sort=False
in your groupby, else it will sort there using the string ordering. In addition, your repetitive code for each year can be replaced with a single groupby, adding year
to the grouping keys. Then you'd then write separately into different files and index=False
is how you get rid of the Index.
import numpy as np
import pandas as pd
df = pd.read_csv('filteredData.csv')
df['DATE'] = pd.to_datetime(df['DATE'])
df['year'] = df['DATE'].dt.year # Datetime has this attribute already
df = df.sort_values(['NAME', 'DATE']) # Output will be in order within each Name
df = (df[df.year.between(2016,2017)] # Only 2016 and 2017
.groupby(['year', 'NAME', 'Month'], sort=False)['SNOW']
.mean().reset_index())
for year,gp in df.groupby('year'): # Write files separately by year
gp[['NAME', 'Month', 'SNOW']].to_csv(f'average{year}.csv', index=False)
Upvotes: 1