Bryce Booze
Bryce Booze

Reputation: 195

Problem with merging Pandas Dataframes with Columns that don't line up

I am attempting to transpose and merge two pandas dataframes, one containing accounts, the segment which they received their deposit, their deposit information, and what day they received the deposit; the other has the accounts, and withdrawal information. The issue is, for indexing purposes, the segment information from one dataframe should line up with the information of the other, regardless of there being a withdrawal or not.

Notes:

Account Dataframe Code

accounts = DataFrame({'person':[1,1,1,1,1,2,2,2,2,2],
                         'segment':[1,2,3,4,5,1,2,3,4,5],
                         'date_received':[10,20,30,40,50,11,21,31,41,51],
                         'amount_received':[1,2,3,4,5,6,7,8,9,10]})

accounts = accounts.pivot_table(index=["person"], columns=["segment"])

Account Dataframe

               amount_received        date_received
segment        1  2  3  4   5         1   2   3   4   5
person
1              1  2  3  4   5        10  20  30  40  50
2              6  7  8  9  10        11  21  31  41  51

Withdrawal Dataframe Code

withdrawals = DataFrame({'person':[1,1,1,2,2],
                         'withdrawal_segment':[1,1,5,2,3],
                         'withdraw_date':[1,2,3,4,5],
                         'withdraw_amount':[10,20,30,40,50]})

withdrawals = withdrawals.reset_index().pivot_table(index = ['index', 'person'], columns = ['withdrawal_segment'])

Since there can only be unique segments for a person it is required that my column only consists of a unique number once, while still holding all of the data, which is why this dataframe looks so much different.

Withdrawal Dataframe

                      withdraw_date              withdraw_amount
withdrawal_segment    1    2    3    5           1     2     3     5
index person
0     1               1.0  NaN  NaN  NaN        10.0   NaN   NaN   NaN
1     1               2.0  NaN  NaN  NaN        20.0   NaN   NaN   NaN
2     1               NaN  NaN  NaN  3.0         NaN   NaN   NaN  30.0
3     2               NaN  4.0  NaN  NaN         NaN  40.0   NaN   NaN
4     2               NaN  NaN  5.0  NaN         NaN   NaN  50.0   NaN

Merge

merge = accounts.merge(withdrawals, on='person', how='left')

               amount_received        date_received              withdraw_date            withdraw_amount
segment        1  2  3  4   5         1   2   3   4   5          1    2    3    5         1     2     3     5
person
1              1  2  3  4   5        10  20  30  40  50        1.0  NaN  NaN  NaN        10.0   NaN   NaN   NaN
1              1  2  3  4   5        10  20  30  40  50        2.0  NaN  NaN  NaN        20.0   NaN   NaN   NaN
1              1  2  3  4   5        10  20  30  40  50        NaN  NaN  NaN  3.0         NaN   NaN   NaN  30.0
2              6  7  8  9  10        11  21  31  41  51        NaN  4.0  NaN  NaN         NaN  40.0   NaN   NaN
2              6  7  8  9  10        11  21  31  41  51        NaN  NaN  5.0  NaN         NaN   NaN  50.0   NaN

The problem with the merged dataframe is that segments from the withdrawal dataframe aren't lined up with the accounts segments. The desired dataframe should look something like:

               amount_received        date_received              withdraw_date               withdraw_amount
segment        1  2  3  4   5         1   2   3   4   5         1    2    3    4    5        1     2     3    4     5
person
1              1  2  3  4   5        10  20  30  40  50        1.0  NaN  NaN  NaN   NaN     10.0   NaN   NaN   NaN   NaN
1              1  2  3  4   5        10  20  30  40  50        2.0  NaN  NaN  NaN   NaN     20.0   NaN   NaN   NaN   NaN
1              1  2  3  4   5        10  20  30  40  50        NaN  NaN  NaN  NaN   3.0     NaN    NaN   NaN   NaN   30.0
2              6  7  8  9  10        11  21  31  41  51        NaN  4.0  NaN  NaN   NaN     NaN    40.0  NaN   NaN   NaN
2              6  7  8  9  10        11  21  31  41  51        NaN  NaN  5.0  NaN   NaN     NaN    NaN   50.0  NaN   NaN

My problem is that I can't seem to merge across both person and segments. I've thought about inserting a row and column, but because I don't know which segments are and aren't going to have a withdrawal this gets difficult. Is it possible to merge the dataframes so that they line up across both people and segments? Thanks!

Upvotes: 3

Views: 179

Answers (1)

BENY
BENY

Reputation: 323276

Method 1 , using reindex

withdrawals=withdrawals.reindex(pd.MultiIndex.from_product([withdrawals.columns.levels[0],accounts.columns.levels[1]]),axis=1)
merge = accounts.merge(withdrawals, on='person', how='left')
merge
Out[79]: 
        amount_received              date_received                  \
segment               1  2  3  4   5             1   2   3   4   5   
person                                                               
1                     1  2  3  4   5            10  20  30  40  50   
1                     1  2  3  4   5            10  20  30  40  50   
1                     1  2  3  4   5            10  20  30  40  50   
2                     6  7  8  9  10            11  21  31  41  51   
2                     6  7  8  9  10            11  21  31  41  51   
        withdraw_amount                       withdraw_date                     
segment               1     2     3   4     5             1    2    3   4    5  
person                                                                          
1                  10.0   NaN   NaN NaN   NaN           1.0  NaN  NaN NaN  NaN  
1                  20.0   NaN   NaN NaN   NaN           2.0  NaN  NaN NaN  NaN  
1                   NaN   NaN   NaN NaN  30.0           NaN  NaN  NaN NaN  3.0  
2                   NaN  40.0   NaN NaN   NaN           NaN  4.0  NaN NaN  NaN  
2                   NaN   NaN  50.0 NaN   NaN           NaN  NaN  5.0 NaN  NaN 

Method 2 , using unstack and stack

merge = accounts.merge(withdrawals, on='person', how='left')
merge.stack(dropna=False).unstack()
Out[82]: 
        amount_received              date_received                  \
segment               1  2  3  4   5             1   2   3   4   5   
person                                                               
1                     1  2  3  4   5            10  20  30  40  50   
1                     1  2  3  4   5            10  20  30  40  50   
1                     1  2  3  4   5            10  20  30  40  50   
2                     6  7  8  9  10            11  21  31  41  51   
2                     6  7  8  9  10            11  21  31  41  51   
        withdraw_amount                       withdraw_date                     
segment               1     2     3   4     5             1    2    3   4    5  
person                                                                          
1                  10.0   NaN   NaN NaN   NaN           1.0  NaN  NaN NaN  NaN  
1                  20.0   NaN   NaN NaN   NaN           2.0  NaN  NaN NaN  NaN  
1                   NaN   NaN   NaN NaN  30.0           NaN  NaN  NaN NaN  3.0  
2                   NaN  40.0   NaN NaN   NaN           NaN  4.0  NaN NaN  NaN  
2                   NaN   NaN  50.0 NaN   NaN           NaN  NaN  5.0 NaN  NaN  

Upvotes: 1

Related Questions