pandas true outer join?

Question

How do you get a true outer join in pandas? Meaning that it actually gives you the entire output instead of combining the columns to merge on. This is kind of silly in my opinion because it then makes it hard to determine what kind of operation to do on a row. I do this all the time to detect whether I should Insert, Update, or Delete data, however I always have to create extra copies of the merge on columns, which is just a bunch of overhead (sometimes massive amounts) on certain data sets.

example:

import pandas as pd

keys = ["A","B"]

df1 = pd.DataFrame({"A":[1,2,3],"B":["one","two","three"],"C":["testThis","testThat", "testThis"],"D":[None,hash("B"),hash("C")]})
df2 = pd.DataFrame({"A":[2,3,4],"B":["two","three","four"],"C":["testThis","testThat", "testThis"], "D":[hash("G"),hash("C"),hash("D")]})

fullJoinDf = df1.merge(df2, how="outer", left_on=keys, right_on=keys, suffixes=["","_r"])
display(
    fullJoinDf,
)

    A   B       C           D               C_r          D_r
0   1   one     testThis    NaN             NaN          NaN
1   2   two     testThat    -3.656526e+18   testThis    -9.136326e+18
2   3   three   testThis    -8.571400e+18   testThat    -8.571400e+18
3   4   four    NaN         NaN             testThis    -4.190116e+17

Noticed how it output A & B magically combined to a single set of columns. What I want is what I would get in SQL outerjoins etc like:

    A    B      C           D               A_r  B_r     C_r        D_r
0   1    one    testThis    NaN             NaN  NaN     NaN        NaN     
1   2    two    testThat    -3.656526e+18   2    two     testThis   -9.136326e+18
2   3    three  testThis    -8.571400e+18   3    three   testThat   -8.571400e+18
3   NaN  NaN    NaN         NaN             4    four    testThis   -4.190116e+17

Edit for @Felipe Whitaker

Using concat:

df3 = df1.copy().set_index(keys)
df4 = df2.copy().set_index(keys)
t = pd.concat([df3,df4], axis=1)
t.reset_index(), 

    A   B       C           D               C           D
0   1   one     testThis    NaN             NaN         NaN
1   2   two     testThat    -3.656526e+18   testThis    -9.136326e+18
2   3   three   testThis    -8.571400e+18   testThat    -8.571400e+18
3   4   four    NaN         NaN             testThis    -4.190116e+17

EDIT Examples* Given the answers I'm posting more tests, so anyone else who stumbles on this can see a few more of the "gatcha" variations I've discovered while doing this.

import pandas as pd

keys = ["A","B"]

df1 = pd.DataFrame({"A":[1,2,3],"B":["one","two","three"],"C":["testThis","testThat", "testThis"],"D":[None,hash("B"),hash("C")]})
df2 = pd.DataFrame({"A":[2,3,4],"B":["two","three","four"],"C":["testThis","testThat", "testThis"], "D":[hash("G"),hash("C"),hash("D")]})

df3 = df1.copy()
df4 = df2.copy()
df3.index = df3[keys]
df4.index = df4[keys]

df5 = df1.copy().set_index(keys)
df6 = df2.copy().set_index(keys)


fullJoinDf = df5.merge(df6, how="outer", left_on=keys, right_on=keys, suffixes=["","_r"])
fullJoinDf_2 = df3.merge(df4, how="outer", left_index=True, right_index=True, suffixes=["","_r"])
t = pd.concat([df1,df2], axis=1, keys=["A","B"])
display(
    df3.index,
    df5.index,
    fullJoinDf,
    fullJoinDf_2,
    t,
)

Index([(1, 'one'), (2, 'two'), (3, 'three')], dtype='object')
MultiIndex([(1,   'one'),
            (2,   'two'),
            (3, 'three')],
           names=['A', 'B'])

    A   B       C           D               C_r         D_r
0   1   one     testThis    NaN             NaN         NaN
1   2   two     testThat    -3.656526e+18   testThis    -9.136326e+18
2   3   three   testThis    -8.571400e+18   testThat    -8.571400e+18
3   4   four    NaN         NaN             testThis    -4.190116e+17

            A    B      C           D               A_r  B_r    C_r        D_r
(1, one)    1.0  one    testThis    NaN             NaN  NaN    NaN        NaN
(2, two)    2.0  two    testThat    -3.656526e+18   2.0  two    testThis    -9.136326e+18
(3, three)  3.0  three  testThis    -8.571400e+18   3.0  three  testThat    -8.571400e+18
(4, four)   NaN  NaN    NaN         NaN             4.0  four   testThis    -4.190116e+17

    A   B       C           D               A   B       C           D
0   1   one     testThis    NaN             2   two     testThis    -9136325526401183790
1   2   two     testThat    -3.656526e+18   3   three   testThat    -8571400026927442160
2   3   three   testThis    -8.571400e+18   4   four    testThis    -419011572131270498

Code Different · Accepted Answer

If you don't care abut the original indexes at all:

df1.index = df1[keys]
df2.index = df2[keys]

fullJoinDf = df1.merge(df2, how="outer", left_index=True, right_index=True, suffixes=["","_r"])

Result:

     A      B         C             D  A_r    B_r       C_r           D_r
0  1.0    one  testThis           NaN  NaN    NaN       NaN           NaN
1  2.0    two  testThat  6.368540e+18  2.0    two  testThis -6.457388e+18
2  3.0  three  testThis -7.490461e+18  3.0  three  testThat -7.490461e+18
3  NaN    NaN       NaN           NaN  4.0   four  testThis  4.344649e+18

pandas true outer join?

Answers (2)

Related Questions