Reputation: 1966
Background:
I have a data-frame df
where I have a few columns indicating number of transaction for a given date. The data stretches for 10 years. The data frame looks like:
Date ANZ_Volume BHP_Volume CBA_Volume MQG_Volume NAB_Volume
2006-01-02 1106877 7280093 955871 148134 901928
2006-01-03 3498020 16274003 1963392 683766 2429254
2006-01-04 2844613 11436553 2149895 399354 3223708
2006-01-05 2661226 7104800 2137384 560498 1649814
2006-01-08 3040459 14577664 1437820 492849 2690357
2006-01-09 4346403 12040685 2891248 608287 3273293
2006-01-10 4367498 15002163 3960253 550975 3514500
2006-01-11 3598690 15928934 3875594 808685 3487634
2006-01-12 3542926 9366744 1874046 807708 1838725
2006-01-15 2291792 7491041 1736446 569285 2465805
2006-01-16 3352969 10613706 1596676 1071833 2763514
2006-01-17 4515208 23156310 4200233 1401628 4487772
2006-01-18 3027208 19241218 2631980 816190 4391474
2006-01-19 3912358 16356046 3094409 682497 6956628
2006-01-22 3933020 15533592 3560834 948459 4655687
2006-01-23 2412419 17092104 3204438 967484 3556701
2006-01-24 7624649 34777198 9997472 1156034 10233959
2006-01-26 2683581 24918357 1812563 1841253 3645258
2006-01-29 2106490 15171772 1811530 506192 5302280
2006-01-30 4817301 22417229 3126666 1078237 4085055
2006-01-31 5190244 18597719 2373929 4558877 8095117
2006-02-01 5899027 15911692 2131606 3622954 8167766
Question:
I want to accumulate the the columns on a day of the week basis, with the output of a new df which should look like this:
DayOfWeek ANZ_Volume BHP_Volume CBA_Volume MQG_Volume NAB_Volume
Monday 16035969 69443817 11774899 3873975 14580491
Tueday 25195619 107807393 22495279 8351280 28760602
Wednesday 18053119 87436754 12601638 7488436 22915840
Thursday 12223000 47999362 8917369 2556895 15747447
Friday
Saturday
Sunday 11371761 52774069 8546630 2516785 15114129
How could this be done?
Upvotes: 2
Views: 41
Reputation: 862691
Use groupby
and aggregate sum
with DatetimeIndex.weekday_name
and for ordering of days are possible 2 ways - ordered categoricals or reindex
:
cats = ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']
days = pd.Categorical(df.index.weekday_name, categories=cats, ordered=True)
df = df.groupby(days).sum()
print (df)
ANZ_Volume BHP_Volume CBA_Volume MQG_Volume NAB_Volume
Monday 16035969 69443817 11774899 3873975 14580491
Tuesday 25195619 107807393 22495279 8351280 28760602
Wednesday 15369538 62518397 10789075 5647183 19270582
Thursday 12800091 57745947 8918402 3891956 14090425
Friday 0 0 0 0 0
Saturday 0 0 0 0 0
Sunday 11371761 52774069 8546630 2516785 15114129
Alternative solution:
cats = ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']
df = df.groupby(df.index.weekday_name).sum().reindex(cats)
print (df)
ANZ_Volume BHP_Volume CBA_Volume MQG_Volume NAB_Volume
Date
Monday 16035969.0 69443817.0 11774899.0 3873975.0 14580491.0
Tuesday 25195619.0 107807393.0 22495279.0 8351280.0 28760602.0
Wednesday 15369538.0 62518397.0 10789075.0 5647183.0 19270582.0
Thursday 12800091.0 57745947.0 8918402.0 3891956.0 14090425.0
Friday NaN NaN NaN NaN NaN
Saturday NaN NaN NaN NaN NaN
Sunday 11371761.0 52774069.0 8546630.0 2516785.0 15114129.0
df = df.groupby(df.index.weekday_name).sum().reindex(cats, fill_value=0)
print (df)
ANZ_Volume BHP_Volume CBA_Volume MQG_Volume NAB_Volume
Date
Monday 16035969 69443817 11774899 3873975 14580491
Tuesday 25195619 107807393 22495279 8351280 28760602
Wednesday 15369538 62518397 10789075 5647183 19270582
Thursday 12800091 57745947 8918402 3891956 14090425
Friday 0 0 0 0 0
Saturday 0 0 0 0 0
Sunday 11371761 52774069 8546630 2516785 15114129
Upvotes: 3