Reputation: 989
Suppose I have the following 2 DataFrames:
df1, whose index is ['NameID', 'Date']. For example, df1 can be a panel dataset of historical salaries of employees in a company.
df2, whose index is ['NameID']. For example, df2 can be a dataset of employees' birthday and SSN.
What is the most efficient way to join df1 and df2 on 'NameID' as an index on a 1:m basis? DataFrame.join() doesn't allow 1:m join. I know I can first reset_index() for both df1 and df2, and then use DataFrame.merge() to join them on columns, but I think that is not efficient.
Code:
df1 = pd.DataFrame({'NameID':['A','B','C']*3,
'Date':['20180801']*3+['20180802']*3+['20180803']*3,
'Salary':np.random.rand(9)
})
df1 = df1.set_index(['NameID', 'Date'])
df1
NameID Date Salary
A 20180801 0.831064
B 20180801 0.419464
C 20180801 0.239779
A 20180802 0.500048
B 20180802 0.317452
C 20180802 0.188051
A 20180803 0.076196
B 20180803 0.060435
C 20180803 0.297118
df2 = pd.DataFrame({'NameID':['A','B','C'],
'SSN':[999,888,777]
})
df2 = df2.set_index(['NameID'])
df2
NameID SSN
A 999
B 888
C 777
The result I want to get is:
NameID Date Salary SSN
A 20180801 0.831064 999
A 20180802 0.500048 999
A 20180803 0.076196 999
B 20180801 0.419464 888
B 20180802 0.317452 888
B 20180803 0.060435 888
C 20180801 0.239779 777
C 20180802 0.188051 777
C 20180803 0.297118 777
Upvotes: 0
Views: 111
Reputation: 989
Answering on behalf of warwick12
df3 = pd.merge(df1, df2, left_index=True, right_index=True)
Upvotes: 0
Reputation: 13757
See Michael B's answer, but in addition, you might also want to sort to get your requested output:
pd.merge(df1, df2, on='NameID', how='left').sort_values('SSN', ascending=False)
Upvotes: 0