Alice_inwonderland
Alice_inwonderland

Reputation: 348

Pandas concat outer join doesn't work properly

I'm trying to join 2 dataframes. I will explain using my codes below. My apology because I don't know how to show table outputs, so please run the code and you will see what I mean.

Setup:

df1 = pd.DataFrame({'A': ['A2', 'A3', 'A6', 'A7'],
                    'B': ['B2', 'B3', 'B6', 'B7'],
                    'C': ['C2', 'C3', 'C6', 'C7']})


df2 = pd.DataFrame({'A_': ['A2', 'A3', 'A4'],
                    'B_': ['B2', 'B3', 'B4'],
                    'C_': ['C2', 'C3', 'C4']})

I tried the following method:

new_joined = pd.concat([df1, df2],axis=1,join='outer')

The output is not what i want. What I want is a joined df like this:

joined = pd.DataFrame({'A': ['A2', 'A3', 'A6', 'Nan','A7'],
                    'B': ['B2', 'B3', 'B6','Nan','B7'],
                    'C': ['C2', 'C3', 'C6','Nan','C7'],
                   'A_': ['A2', 'A3', 'Nan','A4','Nan'],
                   'B_': ['B2', 'B3', 'Nan','B4','Nan'],
                   'C_': ['C2', 'C3', 'Nan','C4','Nan']})
joined = joined[['A','B','C','A_','B_','C_']]

Basically, if a row of df1 doesn't match the other the other row of df2, I want that to be Nan, but the function I tried just put [A6,B6,C6] and [A4,B4,C4] on the same line. Please advise!

Upvotes: 2

Views: 2612

Answers (2)

bjschoenfeld
bjschoenfeld

Reputation: 412

I think this is related to this question. You are looking for a full outer join. Using your example, here is the solution:

df1 = pd.DataFrame({'A': ['A2', 'A3', 'A6', 'A7'],
                    'B': ['B2', 'B3', 'B6', 'B7'],
                    'C': ['C2', 'C3', 'C6', 'C7']})


df2 = pd.DataFrame({'A_': ['A2', 'A3', 'A4'],
                    'B_': ['B2', 'B3', 'B4'],
                    'C_': ['C2', 'C3', 'C4']})

df1.merge(df2, how='outer', left_on=list(df1.columns), right_on=list(df2.columns))

which outputs

     A    B    C   A_   B_   C_
0   A2   B2   C2   A2   B2   C2
1   A3   B3   C3   A3   B3   C3
2   A6   B6   C6  NaN  NaN  NaN
3   A7   B7   C7  NaN  NaN  NaN
4  NaN  NaN  NaN   A4   B4   C4

Upvotes: 0

Mark Wang
Mark Wang

Reputation: 2757

pd.concat is not working because it aligns on indexes (row or column) rather than on arbitrary columns. You're probably looking form merge,

df1.merge(df2,left_on=['A','B','C'],right_on=['A_','B_','C_'],how='outer')

Upvotes: 2

Related Questions