Tokaalmighty
Tokaalmighty

Reputation: 402

Pandas merge fail to extract common Index values

I'm trying to merge 2 DataFrames of different sizes, both are indexed by 'Country'. The first dataframe 'GDP_EN' contains every country in the world, and the second dataframe 'ScimEn' contains 15 countries.

When I try to merge these DataFrames,instead of merging the columns based on index countries of ScimEn, I got back 'Country_x' and 'Country_y'. 'Country_x' came from GDP_EN, which are the first 15 countries in alphabetical order. 'Country_y' are the 15 countries from ScimEn. I'm wondering why didn't they merge?

I used:

DF=pd.merge(GDP_EN,ScimEn,left_index=True,right_index=True,how='right')

Upvotes: 2

Views: 436

Answers (1)

jezrael
jezrael

Reputation: 863166

I think both DataFrames are not indexes by Country, but Country is column add parameter on='Country':

GDP_EN = pd.DataFrame({'Country':['USA','France','Slovakia', 'Russia'],
                       'a':[4,8,6,9]})
print (GDP_EN)
    Country  a
0       USA  4
1    France  8
2  Slovakia  6
3    Russia  9

ScimEn = pd.DataFrame({'Country':['France','Slovakia'],
                       'b':[80,70]})
print (ScimEn)
    Country   b
0    France  80
1  Slovakia  70

DF=pd.merge(GDP_EN,ScimEn,left_index=True,right_index=True,how='right')
print (DF)
  Country_x  a Country_y   b
0       USA  4    France  80
1    France  8  Slovakia  70

DF=pd.merge(GDP_EN,ScimEn,on='Country',how='right')
print (DF)
    Country  a   b
0    France  8  80
1  Slovakia  6  70

If Country are indexes it works perfectly:

GDP_EN = pd.DataFrame({'Country':['USA','France','Slovakia', 'Russia'],
                       'a':[4,8,6,9]}).set_index('Country')
print (GDP_EN)
          a
Country    
USA       4
France    8
Slovakia  6
Russia    9

print (GDP_EN.index)
Index(['USA', 'France', 'Slovakia', 'Russia'], dtype='object', name='Country')

ScimEn = pd.DataFrame({'Country':['France','Slovakia'],
                       'b':[80,70]}).set_index('Country')
print (ScimEn)
           b
Country     
France    80
Slovakia  70

print (ScimEn.index)
Index(['France', 'Slovakia'], dtype='object', name='Country')

DF=pd.merge(GDP_EN,ScimEn,left_index=True,right_index=True,how='right')
print (DF)
          a   b
Country        
France    8  80
Slovakia  6  70

Upvotes: 1

Related Questions