HM14
HM14

Reputation: 709

using pandas dataframe to groupby month calculate the mean monthly value

I am trying to calculate the mean monthly error values in my pandas dataframe:

df

output:

    timestamp    error
0   2016-01-01  1.402315
1   2016-01-02  2.100072
2   2016-01-03  2.127035
3   2016-01-04  1.371335
..         ...       ...
362 2016-12-28  2.189454
363 2016-12-29  1.411169

I want to calculate the monthly error values so I attempt:

df=df.groupby(df['timestamp'].dt.strftime('%B'))['error'].mean().sort_values()

This gives me the output:

timestamp
March        1.505680
July         1.728256
June         1.760756
August       1.796719
September    1.830922
October      1.922697
February     2.021558
May          2.100005
December     2.143730
April        2.316473
November     2.491638
January      2.551331

I cant seem to sort these so they are in order jan, feb, mar, april, etc. so that on a plot the months will be in correct order on the xaxis. Any ideas?

Upvotes: 0

Views: 1190

Answers (2)

j sad
j sad

Reputation: 1095

Pandas has built-in functionality for this which makes it simpler (see also How can I Group By Month from a Date field using Python/Pandas)

You can just do

df.groupby(df['timestamp'].dt.month).mean()

and it will do the mean by month and have the index as the month number, so they will be sorted Jan -> Dec.

Upvotes: 0

Erfan
Erfan

Reputation: 42926

You can use the following:

# Print small dataframe which was provided
print(df)
      timestamp     error
0    2016-01-01  1.402315
1    2016-01-02  2.100072
2    2016-01-03  2.127035
3    2016-01-04  1.371335
362  2016-12-28  2.189454
363  2016-12-29  1.411169
# Convert month to monthnumbers
df['timestamp'] = pd.to_datetime(df.timestamp).dt.month

df = df.groupby('timestamp')['error'].mean().sort_index().reset_index()
print(df)
   timestamp     error
0          1  1.750189
1         12  1.800312

Upvotes: 3

Related Questions