Reputation: 307
I have two data frames I would like to merge. The Main Data Frame is Population
Pop:
Country Name Country Code Year Population CountryYear
0 Aruba ABW 1960 54208.0 ABW-1960
1 Andorra AND 1960 13414.0 AND-1960
I have a similar table with Country GDP
GDP:
Country Name Country Code Year GDP CountryYear
0 Aruba ABW 1960 0.000000e+00 ABW-1960
1 Andorra AND 1960 0.000000e+00 AND-1960
What I want is to have a new frame, Combined, that has fields:
Country Name
Country Code
Year
Population
CountryYear
From the Population Table and the respective GDP in table based on CountryYear and have that be the only column added to it.
I tried this but I got duplicate tables:
df_merged = pd.merge(poptransposed, gdptransposed, left_on=['CountryYear'],
right_on=['CountryYear'],
how='inner')
df_merged.head()
Country Name_x Country Code_x Year_x Population CountryYear Country Name_y Country Code_y Year_y GDP
Aruba ABW 1960 54208.0 ABW-1960 Aruba ABW 1960 0.000000e+00
Andorra AND 1960 13414.0 AND-1960 Andorra AND 1960 0.000000e+00
Upvotes: 0
Views: 946
Reputation: 7466
A solution is to use the Country Code as index and then use pandas concat function (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html):
Pop = Pop.set_index('Country Code', drop = True)
GDP = GDP.set_index('Country Code', drop = True)
df_merged= pd.concat([Pop, GDP['GDP'].to_frame('GDP')], axis = 1, join = 'inner').reset_index(drop = False)
Upvotes: 1