Reputation: 125
I have 2 pandas DataFrames that I am manipulating in Python, and I would like to combine them based on a certain column (col_1):
#df1
col_1 col_2
0 3 7
1 4 8
2 5 5
#df2
col_1 col_3
0 4 78
1 5 89
and the output should look like this:
col_1 col_2 col_3
0 3 7 0
1 4 8 78
2 5 5 89
I'm not sure about how to go about this using 'best practices'.
Thank you.
Upvotes: 4
Views: 174
Reputation: 19957
Suppose you have:
df1=pd.DataFrame({'col_1': {0: 3, 1: 4, 2: 5}, 'col_2': {0: 7, 1: 8, 2: 5}})
df2=pd.DataFrame({'col_1': {0: 4, 1: 5}, 'col_3': {0: 78, 1: 89}})
You can use merge:
pd.merge(df1,df2,on='col_1',how='left').fillna(0)
Out[22]:
col_1 col_2 col_3
0 3 7 0.0
1 4 8 78.0
2 5 5 89.0
If you need col_3 as int (thanks to piRSquared for his suggestion):
pd.merge(df1,df2,on='col_1',how='left').fillna(0, downcast='infer')
Out[25]:
col_1 col_2 col_3
0 3 7 0
1 4 8 78
2 5 5 89
Upvotes: 4
Reputation: 323326
Maybe combine_first
df1.set_index('col_1').combine_first(df2.set_index('col_1')).fillna(0).reset_index().astype(int)
Out[237]:
col_1 col_2 col_3
0 3 7 0
1 4 8 78
2 5 5 89
Upvotes: 2
Reputation: 294488
You can use join
after using set_index
.
df1.join(df2.set_index('col_1'), on='col_1').fillna(0, downcast='infer')
col_1 col_2 col_3
0 3 7 0
1 4 8 78
2 5 5 89
Upvotes: 4
Reputation: 164773
Mapping a series if often efficient for a single column.
df1 = pd.DataFrame({'col_1': [3, 4, 5],
'col_2': [7, 8, 5]})
df2 = pd.DataFrame({'col_1': [4, 5],
'col_3': [78, 89]})
df1['col_3'] = df1['col_1'].map(df2.set_index('col_1')['col_3']).fillna(0)
# col_1 col_2 col_3
# 0 3 7 0.0
# 1 4 8 78.0
# 2 5 5 89.0
Upvotes: 2