Reputation: 125
I have a set of data that is indexed by dates. Is there an easy way to obtain yearly totals and monthly averages from this data set?
a b c d e
Statement Date
2003-12-29 655.0 54.51 0.083221 0.0 4.70
2004-01-28 978.0 82.69 0.084550 0.0 4.70
2004-02-25 905.0 78.58 0.086829 0.0 4.70
2004-03-29 1099.0 95.90 0.087261 0.0 4.70
2004-04-28 1070.0 93.88 0.087738 0.0 4.70
2004-05-26 656.0 57.99 0.088399 0.0 4.70
2004-06-28 527.0 43.92 0.083340 0.0 4.70
2004-07-28 399.0 32.79 0.082180 0.0 4.70
2004-08-27 359.0 30.53 0.085042 0.0 4.70
2004-09-28 381.0 34.76 0.091234 0.0 4.70
2004-10-26 471.0 45.25 0.096072 0.0 4.70
2004-11-24 967.0 85.99 0.088925 0.0 4.70
2004-12-28 1175.0 101.49 0.086374 0.0 4.70
2005-01-27 849.0 80.78 0.095147 0.0 4.70
2005-02-24 641.0 61.24 0.095538 0.0 4.70
2005-03-29 821.0 77.10 0.093910 0.0 4.70
2005-04-27 647.0 64.49 0.099675 0.0 4.70
2005-05-26 514.0 49.54 0.096381 0.0 4.70
2005-06-28 411.0 39.78 0.096788 0.0 4.70
2005-07-27 411.0 39.70 0.096594 0.0 4.70
2005-08-29 834.0 83.20 0.099760 0.0 4.70
2005-09-28 589.0 59.67 0.101307 0.0 4.70
2005-10-26 476.0 52.29 0.109853 0.0 4.70
2005-11-28 703.0 77.26 0.109900 0.0 4.70
2005-12-28 758.0 90.35 0.119195 0.0 4.70
2006-01-27 668.0 71.12 0.106467 99.0 10.54
2006-02-24 830.0 88.17 0.106229 13.0 4.70
2006-03-29 859.0 92.09 0.107206 0.0 4.70
2006-04-26 557.0 59.41 0.106661 2.0 4.70
2006-05-26 732.0 76.88 0.105027 27.0 4.70
I would like to create annual totals of column a as well as create an average monthly usage (i.e. average column a values from January of 2004, 2005, and 2006). I was trying to use pandas grouper but couldn't get that to work. It would be nice to output the new values to a new dataframe if possible. Any help is appreciated.
Please let me know if anything is unclear
Upvotes: 0
Views: 1971
Reputation: 862771
I think need ordered CategoricalIndex
for correct ordering in output with DatetimeIndex.month_name
or DatetimeIndex.strftime
with aggregate mean
:
cats = ['January','February','March','April','May','June','July','August',
'September','October','November','December']
idx = pd.CategoricalIndex(df.index.month_name(), categories=cats, ordered=True)
#alternative solution
#idx = pd.CategoricalIndex(df.index.strftime('%B'), categories=cats, ordered=True)
df1 = df.groupby(idx).mean()
print (df1)
a b c d e
Statement Date
January 831.666667 78.196667 0.095388 33.000000 6.646667
February 792.000000 75.996667 0.096199 4.333333 4.700000
March 926.333333 88.363333 0.096126 0.000000 4.700000
April 758.000000 72.593333 0.098025 0.666667 4.700000
May 634.000000 61.470000 0.096602 9.000000 4.700000
June 469.000000 41.850000 0.090064 0.000000 4.700000
July 405.000000 36.245000 0.089387 0.000000 4.700000
August 596.500000 56.865000 0.092401 0.000000 4.700000
September 485.000000 47.215000 0.096271 0.000000 4.700000
October 473.500000 48.770000 0.102962 0.000000 4.700000
November 835.000000 81.625000 0.099413 0.000000 4.700000
December 862.666667 82.116667 0.096263 0.000000 4.700000
And DatetimeIndex.year
for aggregate sum
:
df2 = df.groupby(df.index.year).sum()
print (df2)
a b c d e
Statement Date
2003 655.0 54.51 0.083221 0.0 4.70
2004 8987.0 783.77 1.047944 0.0 56.40
2005 7654.0 775.40 1.214048 0.0 56.40
2006 3646.0 387.67 0.531590 141.0 29.34
Upvotes: 2