Laura Walker
Laura Walker

Reputation: 307

Merging only certain columns in Python

I have two data frames I would like to merge. The Main Data Frame is Population

Pop:
        Country Name    Country Code    Year    Population  CountryYear
    0   Aruba           ABW             1960    54208.0     ABW-1960
    1   Andorra         AND             1960    13414.0     AND-1960

I have a similar table with Country GDP

GDP:

    Country Name    Country Code    Year    GDP              CountryYear
0   Aruba           ABW             1960    0.000000e+00     ABW-1960
1   Andorra         AND             1960    0.000000e+00     AND-1960

What I want is to have a new frame, Combined, that has fields:

Country Name
Country Code
Year    
Population  
CountryYear

From the Population Table and the respective GDP in table based on CountryYear and have that be the only column added to it.

I tried this but I got duplicate tables:

df_merged = pd.merge(poptransposed, gdptransposed, left_on=['CountryYear'],
              right_on=['CountryYear'],
              how='inner')
df_merged.head()


  Country Name_x    Country Code_x  Year_x  Population  CountryYear Country Name_y  Country Code_y  Year_y  GDP
Aruba   ABW 1960    54208.0 ABW-1960    Aruba   ABW 1960    0.000000e+00
Andorra AND 1960    13414.0 AND-1960    Andorra AND 1960    0.000000e+00

Upvotes: 0

Views: 946

Answers (1)

FLab
FLab

Reputation: 7466

A solution is to use the Country Code as index and then use pandas concat function (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html):

Pop = Pop.set_index('Country Code', drop = True)
GDP = GDP.set_index('Country Code', drop = True)

df_merged= pd.concat([Pop, GDP['GDP'].to_frame('GDP')], axis = 1, join = 'inner').reset_index(drop = False)

Upvotes: 1

Related Questions