ericg
ericg

Reputation: 8722

How to merge two dataframes on a timeseries

What I am looking to create is a dataframe which looks like:

    amount  months      category    
0   6460    2018-01-31  budgeted    
1   7905    2018-01-31  actual  
2   11509   2018-02-28  budgeted    
3   21502   2018-02-28  actual 
...
...

The sample code I have and base datafrom I am working with is:

import pandas as pd
import string
import altair as alt

from random import randint

# 
# This is the general form of my 'real' dataframe. It is not subject to change.
#
months                  = [ 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec' ]
monthyAmounts           = [ "actual", "budgeted", "difference" ]

summary = []

summary.append( [ randint( -1000, 15000 ) for x in range( 0, len( months ) * len( monthyAmounts ) ) ] )
summary.append( [ randint( -1000, 15000 ) for x in range( 0, len( months ) * len( monthyAmounts ) ) ]  )
summary.append( [ randint( -1000, 15000 ) for x in range( 0, len( months ) * len( monthyAmounts ) ) ]  )

index   = pd.Index( [ 'Income', 'Expenses', 'Difference' ], name = 'type' )
columns = pd.MultiIndex.from_product( [months, monthyAmounts], names=['month', 'category'] )

summaryDF = pd.DataFrame( summary, index = index, columns = columns )

#
# From this point, I am trying to transform the summaryDF into something 
# I can use in a different context...
#

budgetMonths = pd.date_range( "January, 2018", periods = 12, freq = 'BM' )

idx = pd.IndexSlice
budgeted = summaryDF.loc[ 'Difference', idx[:, 'budgeted' ] ].cumsum()
actual   = summaryDF.loc[ 'Difference', idx[:, 'actual' ] ].cumsum()

budgeted.index = budgetMonths
actual.index = budgetMonths

budgetedDF = pd.DataFrame( { 'amount': budgeted, 'months': budgetMonths, 'category': 'budgeted' })
actualDF   = pd.DataFrame( { 'amount': actual, 'months': budgetMonths, 'category': 'actual' })

print( budgetedDF )
print( actualDF )

df3 = pd.merge( budgetedDF, actualDF, on = 'months' )
df3 

df3 looks like:

    amount_x    months      category_x  amount_y    category_y
0   6460        2018-01-31  budgeted    7905        actual
1   11509       2018-02-28  budgeted    21502       actual
...
...

I think I am close to getting what I want...just need the final merge step.

Upvotes: 0

Views: 86

Answers (2)

cs95
cs95

Reputation: 402263

"Merge" these DataFrames using pd.concat.

df3 = (pd.concat([budgetedDF, actualDF])
         .sort_index()
         .reset_index(drop=True)
)

df3

    amount  category     months
0     2183  budgeted 2018-01-31
1     3612    actual 2018-01-31
2     8902  budgeted 2018-02-28
3     3357    actual 2018-02-28
4     9956  budgeted 2018-03-30
5     2828    actual 2018-03-30
6    14475  budgeted 2018-04-30
7     2990    actual 2018-04-30
8    25385  budgeted 2018-05-31
9     4446    actual 2018-05-31
10   19119    actual 2018-06-29
11   29119  budgeted 2018-06-29
12   27296    actual 2018-07-31
13   40869  budgeted 2018-07-31
14   38443    actual 2018-08-31
15   43400  budgeted 2018-08-31
16   47978    actual 2018-09-28
17   52686  budgeted 2018-09-28
18   49612    actual 2018-10-31
19   63384  budgeted 2018-10-31
20   49272    actual 2018-11-30
21   74107  budgeted 2018-11-30
22   83653  budgeted 2018-12-31
23   56907    actual 2018-12-31

However, you might prefer this representation instead:

df3 = (pd.concat([budgetedDF, actualDF])
         .drop('months', 1)
         .set_index('category', append=True)
         .unstack()
)

df3
           amount         
category   actual budgeted
2018-01-31   3612     2183
2018-02-28   3357     8902
2018-03-30   2828     9956
2018-04-30   2990    14475
2018-05-31   4446    25385
2018-06-29  19119    29119
2018-07-31  27296    40869
2018-08-31  38443    43400
2018-09-28  47978    52686
2018-10-31  49612    63384
2018-11-30  49272    74107
2018-12-31  56907    83653

Upvotes: 1

fuglede
fuglede

Reputation: 18201

Sounds like what you want is

pd.concat([budgetedDF, actualDF]).sort_values('months').reset_index(drop=True)

Upvotes: 0

Related Questions