Reputation: 709
I am trying to calculate the mean monthly error values in my pandas dataframe:
df
output:
timestamp error
0 2016-01-01 1.402315
1 2016-01-02 2.100072
2 2016-01-03 2.127035
3 2016-01-04 1.371335
.. ... ...
362 2016-12-28 2.189454
363 2016-12-29 1.411169
I want to calculate the monthly error values so I attempt:
df=df.groupby(df['timestamp'].dt.strftime('%B'))['error'].mean().sort_values()
This gives me the output:
timestamp
March 1.505680
July 1.728256
June 1.760756
August 1.796719
September 1.830922
October 1.922697
February 2.021558
May 2.100005
December 2.143730
April 2.316473
November 2.491638
January 2.551331
I cant seem to sort these so they are in order jan, feb, mar, april, etc. so that on a plot the months will be in correct order on the xaxis. Any ideas?
Upvotes: 0
Views: 1190
Reputation: 1095
Pandas has built-in functionality for this which makes it simpler (see also How can I Group By Month from a Date field using Python/Pandas)
You can just do
df.groupby(df['timestamp'].dt.month).mean()
and it will do the mean by month and have the index as the month number, so they will be sorted Jan -> Dec.
Upvotes: 0
Reputation: 42926
You can use the following:
# Print small dataframe which was provided
print(df)
timestamp error
0 2016-01-01 1.402315
1 2016-01-02 2.100072
2 2016-01-03 2.127035
3 2016-01-04 1.371335
362 2016-12-28 2.189454
363 2016-12-29 1.411169
# Convert month to monthnumbers
df['timestamp'] = pd.to_datetime(df.timestamp).dt.month
df = df.groupby('timestamp')['error'].mean().sort_index().reset_index()
print(df)
timestamp error
0 1 1.750189
1 12 1.800312
Upvotes: 3