ju.
ju.

Reputation: 1096

Use pandas to combine/merge parent table with child table

I have a parent table df0, with two levels code Code1, Code2, and Message. Multiple child tables as df1 which stores lower level Code2 and Messages. As you can see, "df1" is indicated as index in table df0.

    import pandas as pd

    s1 = pd.Series([1, 0, 'A'])
    s2 = pd.Series([2, 'df1', 'B'])
    s3 = pd.Series([3, 0, 'C'])
    df0 = pd.DataFrame([list(s1), list(s2), list(s3)],  columns =  ["Code1", "Code2", "Message"])

    df0   

         Code1  Code2   Message
    0     1     0       A
    1     2     df1     B
    2     3     0       C

    s4 = pd.Series([0, 'B1'])
    s5 = pd.Series([1, 'B2'])
    s6 = pd.Series([2, 'B3'])
    df1 = pd.DataFrame([list(s4), list(s5), list(s6)],  columns =  ["Code2", "Message"])


    df1
                Code2   Message
        0       0       B1
        1       1       B2
        2       2       B3

I want to have a larger table essential put df1 back to df0 and make a flat merged table. Thank you for your help.


        Code1   Code2   Message
0       1       0       A
1       2               B
2       2       0       B1
3       2       1       B2
4       2       2       B3
5       3       0       C

Upvotes: 2

Views: 486

Answers (1)

Rich Andrews
Rich Andrews

Reputation: 1670

Use merge. The below provides an intermediate result that can be further processed to obtain the desired output.

import pandas as pd
import numpy as np
print(pd.__version__)

s1 = pd.Series([1, 0, 'A'])
s2 = pd.Series([2, 'df1', 'B'])
s3 = pd.Series([3, 0, 'C'])
df0 = pd.DataFrame([list(s1), list(s2), list(s3)],  columns =  ["Code1", "Code2", "Message"])

s4 = pd.Series([0, 'B1'])
s5 = pd.Series([1, 'B2'])
s6 = pd.Series([2, 'B3'])
df1 = pd.DataFrame([list(s4), list(s5), list(s6)],  columns =  ["Code2", "Message"])
join_key = 'df1'
df1[join_key] = join_key

df2 = pd.concat([df0[df0['Code2'] == join_key], df1], sort=True)

result = df0.merge(df2, left_on=['Code2'], right_on=['df1'], how='outer', suffixes=('_l', '_r'))
print (result)

Result

0.24.2
   Code1_l Code2_l Message_l  Code1_r Code2_r Message_r  df1
0      1.0       0         A      NaN     NaN       NaN  NaN
1      3.0       0         C      NaN     NaN       NaN  NaN
2      2.0     df1         B      NaN       0        B1  df1
3      2.0     df1         B      NaN       1        B2  df1
4      2.0     df1         B      NaN       2        B3  df1
5      NaN     NaN       NaN      2.0     df1         B  NaN

Upvotes: 3

Related Questions