Raghu
Raghu

Reputation: 181

How to merge two data frames with duplicate rows?

I have two data frames df1 and df2. The df1 has repeated text wrt column name but column hobby changes. The df2 also has repeated text in the column name. I want to merge both the data frames and keep everything.

df1:
name   hobby

mike   cricket 
mike   football
jack   chess
jack   football
jack   vollyball
pieter sleeping
pieter cyclying

my df2 is

df2:
name

mike
pieter 
jack  
mike
pieter 

Now I have to merge df2 with df1 on name column So my resultant df3 should look like this:

df3:
name   hobby

mike   cricket 
mike   football
pieter sleeping
pieter cyclying
jack   chess
jack   football
jack   vollyball
mike   cricket 
mike   football
pieter sleeping
pieter cyclying


Upvotes: 1

Views: 41

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150825

You want to assign an order for df2, merge on name, then sort by the said order:

(df2.assign(rank=np.arange(len(df2)))
    .merge(df1, on='name')
    .sort_values('rank')
    .drop('rank', axis=1)
)

Output:

      name      hobby
0     mike    cricket
1     mike   football
4   pieter   sleeping
5   pieter   cyclying
8     jack      chess
9     jack   football
10    jack  vollyball
2     mike    cricket
3     mike   football
6   pieter   sleeping
7   pieter   cyclying

Upvotes: 1

Related Questions