Reputation: 8722
import pandas as pd
import string
from random import randint
months = [ 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec' ]
monthyAmounts = [ "actual", "budgeted", "difference" ]
summary = []
summary.append( [ randint( -1000, 15000 ) for x in range( 0, len( months ) * len( monthyAmounts ) ) ] )
summary.append( [ randint( -1000, 15000 ) for x in range( 0, len( months ) * len( monthyAmounts ) ) ] )
summary.append( [ randint( -1000, 15000 ) for x in range( 0, len( months ) * len( monthyAmounts ) ) ] )
index = pd.Index( [ 'Income', 'Expenses', 'Difference' ], name = 'type' )
columns = pd.MultiIndex.from_product( [months, monthyAmounts], names=['month', 'category'] )
summaryDF = pd.DataFrame( summary, index = index, columns = columns )
budgetMonths = pd.date_range( "January, 2018", periods = 12, freq = 'BM' )
idx = pd.IndexSlice
budgetDifference = summaryDF.loc[ 'Difference', idx[:, 'budgeted' ] ].cumsum()
budgetActual = summaryDF.loc[ 'Difference', idx[:, 'actual' ] ].cumsum()
What I want is a dataframe containing just the actual & budgeted columns for the Difference row per month and an additional column containing the months (I need this additional column for graph generation eventually)
If I do just:
budgetDifference = pd.DataFrame( { 'difference' : budgetDifference, 'months' : budgetMonths } )
what I end up with is a dataframe with the difference & month columns.
difference months
month category
Jan budgeted 1097 2018-01-31
Feb budgeted 11476 2018-02-28
Mar budgeted 11143 2018-03-30
Apr budgeted 25082 2018-04-30
May budgeted 28019 2018-05-31
Jun budgeted 37164 2018-06-29
Jul budgeted 36747 2018-07-31
Aug budgeted 44651 2018-08-31
Sep budgeted 54283 2018-09-28
Oct budgeted 62728 2018-10-31
Nov budgeted 76144 2018-11-30
Dec budgeted 77781 2018-12-31
However, when I try:
budgetDifference = pd.DataFrame( { 'difference' : budgetDifference, 'actual' : budgetActual, 'months' : budgetMonths } )
I get:
ValueError: array length 12 does not match index length 24
and I am not sure why.
Upvotes: 1
Views: 33
Reputation: 164623
You need to align indices for the series which constitute your dataframe:
res = pd.DataFrame({'difference': budgetDifference,
'months': budgetMonths,
'actual': pd.Series(budgetActual.values, index=budgetDifference.index)})
print(res)
difference months actual
month category
Jan budgeted 4057 2018-01-31 1592
Feb budgeted 4550 2018-02-28 2211
Mar budgeted 3847 2018-03-30 4096
Apr budgeted 12970 2018-04-30 9588
May budgeted 17459 2018-05-31 19623
Jun budgeted 30884 2018-06-29 32347
Jul budgeted 35258 2018-07-31 37205
Aug budgeted 35823 2018-08-31 50234
Sep budgeted 47599 2018-09-28 57188
Oct budgeted 61258 2018-10-31 71096
Nov budgeted 65914 2018-11-30 71904
Dec budgeted 73814 2018-12-31 77308
Upvotes: 1