Reputation: 8352
I'm trying to merge two dataframes.
I want to merge on one column, that is the index of the second DataFrame and one column, that is a column in the second Dataframe. The column/index names are different in both DataFrames.
Example:
import pandas as pd
df2 = pd.DataFrame([(i,'ABCDEFGHJKL'[j], i*2 + j)
for i in range(10)
for j in range(10)],
columns = ['Index','Sub','Value']).set_index('Index')
df1 = pd.DataFrame([['SOMEKEY-A',0,'A','MORE'],
['SOMEKEY-B',4,'C','MORE'],
['SOMEKEY-C',7,'A','MORE'],
['SOMEKEY-D',5,'Z','MORE']
], columns=['key', 'Ext. Index', 'Ext. Sub', 'Description']
).set_index('key')
df1 prints out
key Ext. Index Ext. Sub Description
SOMEKEY-A 0 A MORE
SOMEKEY-B 4 C MORE
SOMEKEY-C 7 A MORE
SOMEKEY-D 5 Z MORE
the first lines of df2 are
Index Sub Value
0 A 0
0 B 1
0 C 2
0 D 3
0 E 4
I want to merge "Ext. Index" and "Ext. Sub" with DataFrame df2, where the index is "Index" and the column is "Sub"
The expected result is:
key Ext. Index Ext. Sub Description Ext. Value
SOMEKEY-A 0 A MORE 0
SOMEKEY-B 4 C MORE 10
SOMEKEY-C 7 A MORE 14
SOMEKEY-D 5 Z MORE None
Manually, the merge works like this
def get_value(x):
try:
return df2[(df2.Sub == x['Ext. Sub']) &
(df2.index == x['Ext. Index'])]['Value'].iloc[0]
except IndexError:
return None
df1['Ext. Value'] = df1.apply(get_value, axis = 1)
Can I do this with a pd.merge
or pd.concat
command, without
changing the df2 by turning the df2.index into a column?
Upvotes: 0
Views: 97
Reputation: 18647
Try using:
df_new = (df1.merge(df2[['Sub', 'Value']],
how='left',
left_on=['Ext. Index', 'Ext. Sub'],
right_on=[df2.index, 'Sub'])
.set_index(df1.index)
.drop('Sub', axis=1))
Upvotes: 2