Reputation: 49
This is my first question on stackoverflow so please bear with me.
I'm new to pandas, I need to merge two dataframes based on values in columns:
df1:
Col1 Col2
0 A 10
1 B 12
2 C 22
3 D 22
4 E 22
5 F 11
6 G 55
7 H 60
df2:
Col3 Col4
0 J 11
1 K 22
2 L 22
3 M 60
4 N 80
5 O 45
6 P 12
7 Q 65
The resulting dataframe should be like:
Col1 Col2 Col3 Col4
0 B 12 P 12
1 C 22 K 22
2 D 22 L 22
3 H 60 M 60
It should compare Col2 and Col4 and only keep common rows and merge two data frames into one. I tried following code but result is not what I desire.
both_DFS=df1.merge(df2,left_on='Col2',right_on='Col4')
For this result is:
Col1 Col2 Col3 Col4
0 B 12 P 12
1 C 22 K 22
2 C 22 L 22
3 D 22 K 22
4 D 22 L 22
5 E 22 K 22
6 E 22 L 22
7 H 60 M 60
Upvotes: 2
Views: 376
Reputation: 150735
You need to enumerate the duplicate rows with groupby().cumcount
then merge:
(df1.assign(idx=df1.groupby('Col2').cumcount())
.merge(df2.assign(idx=df2.groupby('Col4').cumcount()),
left_on=['Col2','idx'],
right_on=['Col4','idx'])
)
Output:
Col1 Col2 idx Col3 Col4
0 B 12 0 P 12
1 C 22 0 K 22
2 D 22 1 L 22
3 F 11 0 J 11
4 H 60 0 M 60
Upvotes: 1