dbulgakov
dbulgakov

Reputation: 81

How do I join two dataframes based on values in selected columns?

I am trying to join (merge) two dataframes based on values in each column. For instance, to merge by values in columns in A and B. So, having df1

    A  B  C  D  L
0   4  3  1  5  1
1   5  7  0  3  2
2   3  2  1  6  4

And df2

    A  B  E  F  L
0   4  3  4  5  1 
1   5  7  3  3  2 
2   3  8  5  5  5

I want to get a d3 with such structure

    A     B     C     D     E     F     L
0   4     3     1     5     4     5     1
1   5     7     0     3     3     3     2
2   3     2     1     6     Nan   Nan   4
3   3     8     Nan   Nan   5     5     5

Can you, please help me? I've tried both merge and join methods but havent succeed.

Upvotes: 3

Views: 334

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210842

UPDATE: (for updated DFs and new desired DF)

In [286]: merged = pd.merge(df1, df2, on=['A','B'], how='outer', suffixes=('','_y'))

In [287]: merged.L.fillna(merged.pop('L_y'), inplace=True)

In [288]: merged
Out[288]:
   A  B    C    D    L    E    F
0  4  3  1.0  5.0  1.0  4.0  5.0
1  5  7  0.0  3.0  2.0  3.0  3.0
2  3  2  1.0  6.0  4.0  NaN  NaN
3  3  8  NaN  NaN  5.0  5.0  5.0

Data:

In [284]: df1
Out[284]:
   A  B  C  D  L
0  4  3  1  5  1
1  5  7  0  3  2
2  3  2  1  6  4

In [285]: df2
Out[285]:
   A  B  E  F  L
0  4  3  4  5  1
1  5  7  3  3  2
2  3  8  5  5  5

OLD answer:

you can use pd.merge(..., how='outer') method:

In [193]: pd.merge(a,b, on=['A','B'], how='outer')
Out[193]:
   A  B    C    D    E    F
0  4  3  1.0  5.0  4.0  5.0
1  5  7  0.0  3.0  3.0  3.0
2  3  2  1.0  6.0  NaN  NaN
3  3  8  NaN  NaN  5.0  5.0

Data:

In [194]: a
Out[194]:
   A  B  C  D
0  4  3  1  5
1  5  7  0  3
2  3  2  1  6

In [195]: b
Out[195]:
   A  B  E  F
0  4  3  4  5
1  5  7  3  3
2  3  8  5  5

Upvotes: 3

Related Questions