576i
576i

Reputation: 8352

Pandas DataFrame: How to merge left with a second DataFrame on a combination of index and columns

I'm trying to merge two dataframes.

I want to merge on one column, that is the index of the second DataFrame and one column, that is a column in the second Dataframe. The column/index names are different in both DataFrames.

Example:

import pandas as pd

df2 = pd.DataFrame([(i,'ABCDEFGHJKL'[j], i*2 + j) 
                    for i in range(10) 
                    for j in range(10)],
                    columns = ['Index','Sub','Value']).set_index('Index')

df1 = pd.DataFrame([['SOMEKEY-A',0,'A','MORE'],
                    ['SOMEKEY-B',4,'C','MORE'],
                    ['SOMEKEY-C',7,'A','MORE'],
                    ['SOMEKEY-D',5,'Z','MORE']
                   ], columns=['key', 'Ext. Index', 'Ext. Sub', 'Description']
                  ).set_index('key')

df1 prints out

key Ext. Index  Ext. Sub    Description
SOMEKEY-A   0   A   MORE
SOMEKEY-B   4   C   MORE
SOMEKEY-C   7   A   MORE
SOMEKEY-D   5   Z   MORE

the first lines of df2 are

Index   Sub Value
0   A   0
0   B   1
0   C   2
0   D   3
0   E   4

I want to merge "Ext. Index" and "Ext. Sub" with DataFrame df2, where the index is "Index" and the column is "Sub"

The expected result is:

key Ext. Index  Ext. Sub    Description Ext. Value
SOMEKEY-A   0   A   MORE    0
SOMEKEY-B   4   C   MORE    10
SOMEKEY-C   7   A   MORE    14
SOMEKEY-D   5   Z   MORE    None

Manually, the merge works like this

def get_value(x):
    try:
        return df2[(df2.Sub == x['Ext. Sub']) & 
                   (df2.index == x['Ext. Index'])]['Value'].iloc[0]
    except IndexError:
        return None

df1['Ext. Value'] = df1.apply(get_value, axis = 1)

Can I do this with a pd.merge or pd.concat command, without changing the df2 by turning the df2.index into a column?

Upvotes: 0

Views: 97

Answers (1)

Chris Adams
Chris Adams

Reputation: 18647

Try using:

df_new = (df1.merge(df2[['Sub', 'Value']],
                    how='left',
                    left_on=['Ext. Index', 'Ext. Sub'],
                    right_on=[df2.index, 'Sub'])
          .set_index(df1.index)
          .drop('Sub', axis=1))

Upvotes: 2

Related Questions