Reputation: 8722
What I am looking to create is a dataframe which looks like:
amount months category
0 6460 2018-01-31 budgeted
1 7905 2018-01-31 actual
2 11509 2018-02-28 budgeted
3 21502 2018-02-28 actual
...
...
The sample code I have and base datafrom I am working with is:
import pandas as pd
import string
import altair as alt
from random import randint
#
# This is the general form of my 'real' dataframe. It is not subject to change.
#
months = [ 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec' ]
monthyAmounts = [ "actual", "budgeted", "difference" ]
summary = []
summary.append( [ randint( -1000, 15000 ) for x in range( 0, len( months ) * len( monthyAmounts ) ) ] )
summary.append( [ randint( -1000, 15000 ) for x in range( 0, len( months ) * len( monthyAmounts ) ) ] )
summary.append( [ randint( -1000, 15000 ) for x in range( 0, len( months ) * len( monthyAmounts ) ) ] )
index = pd.Index( [ 'Income', 'Expenses', 'Difference' ], name = 'type' )
columns = pd.MultiIndex.from_product( [months, monthyAmounts], names=['month', 'category'] )
summaryDF = pd.DataFrame( summary, index = index, columns = columns )
#
# From this point, I am trying to transform the summaryDF into something
# I can use in a different context...
#
budgetMonths = pd.date_range( "January, 2018", periods = 12, freq = 'BM' )
idx = pd.IndexSlice
budgeted = summaryDF.loc[ 'Difference', idx[:, 'budgeted' ] ].cumsum()
actual = summaryDF.loc[ 'Difference', idx[:, 'actual' ] ].cumsum()
budgeted.index = budgetMonths
actual.index = budgetMonths
budgetedDF = pd.DataFrame( { 'amount': budgeted, 'months': budgetMonths, 'category': 'budgeted' })
actualDF = pd.DataFrame( { 'amount': actual, 'months': budgetMonths, 'category': 'actual' })
print( budgetedDF )
print( actualDF )
df3 = pd.merge( budgetedDF, actualDF, on = 'months' )
df3
df3 looks like:
amount_x months category_x amount_y category_y
0 6460 2018-01-31 budgeted 7905 actual
1 11509 2018-02-28 budgeted 21502 actual
...
...
I think I am close to getting what I want...just need the final merge step.
Upvotes: 0
Views: 86
Reputation: 402263
"Merge" these DataFrames using pd.concat
.
df3 = (pd.concat([budgetedDF, actualDF])
.sort_index()
.reset_index(drop=True)
)
df3
amount category months
0 2183 budgeted 2018-01-31
1 3612 actual 2018-01-31
2 8902 budgeted 2018-02-28
3 3357 actual 2018-02-28
4 9956 budgeted 2018-03-30
5 2828 actual 2018-03-30
6 14475 budgeted 2018-04-30
7 2990 actual 2018-04-30
8 25385 budgeted 2018-05-31
9 4446 actual 2018-05-31
10 19119 actual 2018-06-29
11 29119 budgeted 2018-06-29
12 27296 actual 2018-07-31
13 40869 budgeted 2018-07-31
14 38443 actual 2018-08-31
15 43400 budgeted 2018-08-31
16 47978 actual 2018-09-28
17 52686 budgeted 2018-09-28
18 49612 actual 2018-10-31
19 63384 budgeted 2018-10-31
20 49272 actual 2018-11-30
21 74107 budgeted 2018-11-30
22 83653 budgeted 2018-12-31
23 56907 actual 2018-12-31
However, you might prefer this representation instead:
df3 = (pd.concat([budgetedDF, actualDF])
.drop('months', 1)
.set_index('category', append=True)
.unstack()
)
df3
amount
category actual budgeted
2018-01-31 3612 2183
2018-02-28 3357 8902
2018-03-30 2828 9956
2018-04-30 2990 14475
2018-05-31 4446 25385
2018-06-29 19119 29119
2018-07-31 27296 40869
2018-08-31 38443 43400
2018-09-28 47978 52686
2018-10-31 49612 63384
2018-11-30 49272 74107
2018-12-31 56907 83653
Upvotes: 1
Reputation: 18201
Sounds like what you want is
pd.concat([budgetedDF, actualDF]).sort_values('months').reset_index(drop=True)
Upvotes: 0