Reputation: 23
How do i concatenate in pandas using a column as a key like we do in sql ?
df1
col1 col2
a1 b1
a2 b2
a3 b3
a4 b4
df2
col3 col4
a1 d1
a2 d3
a3 d3
I want to merge/concatenate them on col1 = col3 without getting rid of records that are not in col3 but in are in col 1. Similar to a left join in sql.
df
col1 col2 col4
a1 b1 d1
a2 b2 d2
a3 b3 d3
a4 b4 NA
Upvotes: 0
Views: 1199
Reputation: 8521
Does the following work for you:
df1 = pd.DataFrame(
[
['a1', 'b1'],
['a2', 'b2'],
['a3', 'b3'],
['a4', 'b4']
],
columns=['col1', 'col2']
)
df2 = pd.DataFrame(
[
['a1', 'd1'],
['a2', 'd2'],
['a3', 'd3']
],
columns=['col3', 'col4']
)
df1 = df1.set_index('col1')
df2 = df2.set_index('col3')
dd = df2[df2.index.isin(df1.index)]
# dd.index.names = ['col1']
df = pd.concat([df1, dd], axis=1).reset_index().rename(columns={'index': 'col1'})
# Output
col1 col2 col4
0 a1 b1 d1
1 a2 b2 d2
2 a3 b3 d3
3 a4 b4 NaN
Upvotes: 2