Soccerrlife
Soccerrlife

Reputation: 79

Merge/join two dataframes based on index and colum

I would like to join (or merge?) two dataframes. They look like the following:

Table 1 ( = df)

index  |   year  |  country
----------------------------
0      |   1970  | NL
1      |   1970  | UK
2      |   1980  | US
3      |   1990  | NL
4      |   1990  | US

Table 2 (= df_gdp)

cntry  |   1970  |  1980   |   1990
-----------------------------------
NL     |   5     |    3    |   0
UK     |   1     |    7    |   1
US     |   9     |    2    |   0

The result should be Table1 with an additional column 'GDP'. The values of Table1.year and Table.country should be used to look up the value in Table2. So the result would be:

index  |   year  |  country  | GDP 
--------------------------------------
0      |   1970  | NL        | 5
1      |   1970  | UK        | 1
2      |   1980  | US        | 2
3      |   1990  | NL        | 0
4      |   1990  | US        | 0

I already wrote the function with .iterrows() but as expected, this does not have a good performance. Instead, I'm wondering whether the result can also be achieved by either .join() or .merge(). What I do not understand is how to merge/join based on the index (cntry) and a changing column (the year). The code of .iterrows()looks like the following:

# Add GDP data 
for index, row in df.iterrows():
    gdp_year = str(df.iloc[index].year)
    gdp_country = str(df.iloc[index].country)
    
    try:
        df.at[index, 'GDP'] = df_gdp.loc[gdp_country][gdp_year]
    except:
        df.at[index, 'GDP'] = 0
df

Upvotes: 2

Views: 122

Answers (4)

Sahil
Sahil

Reputation: 1

This code worked for me.

import pandas as pd

valueset_data = {
    'varName': ['ICLEVEL', 'ICLEVEL', 'State', 'State'],
    'codevalue': [1, 2, 1, 2],
    'varLabel': ['Less than 2 years (below associate)', 'Four years', 'MA', 'CT']
}

HD2019_data = {
    'InstituteID': [23, 32],
    'ICLEVEL': [1, 2],
    'State': [2, 1]
}

valueset = pd.DataFrame(valueset_data)
HD2019 = pd.DataFrame(HD2019_data)

valueset_dict = valueset.set_index(['varName', 'codevalue']).to_dict()['varLabel']

HD2019['ICLEVEL'] = HD2019.apply(lambda x: valueset_dict[('ICLEVEL', x['ICLEVEL'])], axis=1)
HD2019['State'] = HD2019.apply(lambda x: valueset_dict[('State', x['State'])], axis=1)

print(HD2019)

Upvotes: 0

k33da_the_bug
k33da_the_bug

Reputation: 820

You can also use merge as follows.

Explanation:

Set index to country column and then perform stack operation so that you can get columns in row. Then you need to reset_index() to get dataframe structure similar to df1. After that you need to specify column names using set_axis() so that you can map it later while performing merge operation.

And at last just perform merge operation on df1 and preprocessed df_m dataframe(default is inner join operation).

import pandas as pd

data1 = {
    'year':['1970','1970','1980','1990','1990'],
    'country':['NL','UK','US','NL','US']
}

data2 = {
    'country':['NL','UK','US'],
    '1970':[5,1,9],
    '1980':[3,7,2],
    '1990':[0,1,0]
}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

# See below code
df_m = df2.set_index('country').stack().reset_index().set_axis(['country', 'year', 'GDP'], axis=1)
pd.merge(df1, df_m, on=['year','country'])

Upvotes: 1

Mykola Zotko
Mykola Zotko

Reputation: 17794

You can use melt and merge:

df2.rename({'cntry': 'country'}, axis=1)\
.melt('country', var_name='year', value_name='GDP')\
.merge(df1, on=['country', 'year'])

Output:

  country  year  GDP
0      NL  1970    5
1      UK  1970    1
2      US  1980    2
3      NL  1990    0
4      US  1990    0

Upvotes: 0

IoaTzimas
IoaTzimas

Reputation: 10624

You can create a function that takes a dataframe as argument, and apply it to df:

def f(x):
    return df_gdp.loc[x['country'],x['year']]

df['GDP']=df.apply(f, axis=1)

Result:

   year country  GDP
0  1970      NL    5
1  1970      UK    1
2  1980      US    2
3  1990      NL    0
4  1990      US    0

Upvotes: 1

Related Questions