Reputation: 1010
I have previously worked with Stata and am now trying to get the same done with Python. However, I have troubles with the merge command. Somehow I must be missing something. My two dataframes I want to merge look like this:
df1:
Date id Market_Cap
2000 1 400
2000 2 200
2001 1 410
2001 2 220
df2:
id Ticker
1 Shell
2 ExxonMobil
My aim now is to get the following dataset:
Date id Market_Cap Ticker
2000 1 400 Shell
2000 2 200 ExxonMobil
2001 1 410 Shell
2001 2 220 ExxonMobil
I tried the following command:
merged= pd.merge(df1, df2, how="left", on="id")
This merges the datasets, but gives me only nan's in the Ticker column. I looked at several sources and maybe I am mistaken, but isn't the "left" command the right thing do to for my purpose? I also tried "right" and "outer". They don't get the result I want to and "inner" does not seem to work here in general.
Am I missing something crucial?
Upvotes: 2
Views: 9726
Reputation: 863301
Thyere is problem your column id
in one df is object
(obviously string
) and another int
, so no match and get NaN
.
If have same dtypes
:
print (df1['id'].dtypes)
int64
print (df2['id'].dtypes)
int64
merged = pd.merge(df1, df2, how="left", on="id")
print (merged)
Date id Market_Cap Ticker
0 2000 1 400 Shell
1 2000 2 200 ExxonMobil
2 2001 1 410 Shell
3 2001 2 220 ExxonMobil
Another solution if need add only one new column is map
:
df1['Ticker'] = df1['id'].map(df2.set_index('id')['Ticker'])
print (df1)
Date id Market_Cap Ticker
0 2000 1 400 Shell
1 2000 2 200 ExxonMobil
2 2001 1 410 Shell
3 2001 2 220 ExxonMobil
Simulate your problem:
print (df1['id'].dtypes)
object
print (df2['id'].dtypes)
int64
df1['Ticker'] = df1['id'].map(df2.set_index('id')['Ticker'])
print (df1)
Date id Market_Cap Ticker
0 2000 1 400 NaN
1 2000 2 200 NaN
2 2001 1 410 NaN
3 2001 2 220 NaN
And solution is convert to int
by astype
(or column id
in df2
to str
):
df1['id'] = df1['id'].astype(int)
#alternatively
#df2['id'] = df2['id'].astype(str)
df1['Ticker'] = df1['id'].map(df2.set_index('id')['Ticker'])
print (df1)
Date id Market_Cap Ticker
0 2000 1 400 Shell
1 2000 2 200 ExxonMobil
2 2001 1 410 Shell
3 2001 2 220 ExxonMobil
Upvotes: 10