Reputation: 99
I have a dataframe df1
that holds rows for thousands of trades between a certain country and the United Kingdom. I also have another dataframe df2
that holds a few rows for distances between certain countries and the United Kingdom.
Example:
x1 = [{'partner': "Afghanistan", 'trade_value':100},
{'partner':"Zambia",'trade_value':110},
{'partner': "Germany",'trade_value':120},
{'partner': "Afghanistan",'trade_value':150},
{'partner': "USA",'trade_value':1120}]
df1 = pd.DataFrame(x1)
x2 = [{'country': "Afghanistan", 'distance':1000},
{'country': "USA",'distance':1120}];
df2 = pd.DataFrame(x2)
I want to add a new column to df1
called 'distance' and assign each 'partner' in df1
with its appropriate 'distance' from df2
.
Note: sometimes a 'partner' country in df1
is repeated due to multiple trades. Also not all distances are available in df2
so I don't mind leaving a cell in df1 empty.
so far I am trying this to no avail:
#Add new column
df1['distance'] = 0;
for index, row in df1.iterrows():
for index, row2 in df2.iterrows():
if row['partner'] == row2['country']:
df1['distance'].replace(row['distance'], row2['distance'],inplace=True);
I am getting all the distance column filled with 1000, which is the made up distance between Afghanistan and the UK, but all other countries are getting the same value (when some countries should not even be getting any value at all)
Upvotes: 3
Views: 4051
Reputation: 4233
IIUC you can use .map
x1 = [{'partner':"Afghanistan", 'trade_value':100}, {'partner':"Zambia",'trade_value':110},
{'partner': "Germany",'trade_value':120},
{'partner': "Afghanistan",'trade_value':150},{'partner': "USA",'trade_value':1120}]
df1 = pd.DataFrame(x1)
x2 = [{'country': "Afghanistan", 'distance':1000}, {'country': "USA",'distance':1120}]
df2 = pd.DataFrame(x2)
df1['distance'] = df1['partner'].map(df2.set_index('country')['distance'])
print (df1)
partner trade_value distance
0 Afghanistan 100 1000.0
1 Zambia 110 NaN
2 Germany 120 NaN
3 Afghanistan 150 1000.0
4 USA 1120 1120.0
Upvotes: 5