bintelligent
bintelligent

Reputation: 23

How to concatenate dataframes using 2 columns as key?

How do i concatenate in pandas using a column as a key like we do in sql ?

df1

col1 col2 
a1   b1  
a2   b2   
a3   b3
a4   b4  

df2

col3 col4
a1     d1
a2     d3
a3     d3

I want to merge/concatenate them on col1 = col3 without getting rid of records that are not in col3 but in are in col 1. Similar to a left join in sql.

df

col1 col2 col4
a1   b1     d1
a2   b2     d2
a3   b3     d3
a4   b4     NA

Upvotes: 0

Views: 1199

Answers (1)

Jeril
Jeril

Reputation: 8521

Does the following work for you:

df1 = pd.DataFrame(
    [
        ['a1', 'b1'],
        ['a2', 'b2'],
        ['a3', 'b3'],
        ['a4', 'b4']
    ],
    columns=['col1', 'col2']
)

df2 = pd.DataFrame(
    [
        ['a1', 'd1'],
        ['a2', 'd2'],
        ['a3', 'd3']
    ],
    columns=['col3', 'col4']
)

df1 = df1.set_index('col1')
df2 = df2.set_index('col3')

dd = df2[df2.index.isin(df1.index)]
# dd.index.names = ['col1']

df = pd.concat([df1, dd], axis=1).reset_index().rename(columns={'index': 'col1'})

# Output
    col1 col2   col4
0   a1   b1     d1
1   a2   b2     d2
2   a3   b3     d3
3   a4   b4     NaN

Upvotes: 2

Related Questions