ASH
ASH

Reputation: 20342

Merge Two DFs Without Dupes

I am trying to merge two data frames and eliminate dupes.

Here is DF#1:

import pandas as pd
data1 = {'id':['168'],'group_id':['360002136432'],'ticket_form_id':['360000159712']}
df1 = pd.DataFrame(data1)
print(df1)

Here is DF #2

data2 = {'id':['362936613051','362936613051','362936613051'],'ticket_id':['168','168','168']}
df2 = pd.DataFrame(data2)
print(df2)

I am trying to merge, or consolidate, DF#1 and DF#2, so it looks like this.

id  group_id    ticket_form_id  ID
168 360002136432    360000159712    362936613051

It would be some kind of inner join (I think) between DF#1.id and DF#2.ticket_id, but I keep getting a bunch of dupes in the merged data frame. How can I eliminate dupes in the merged data frame.

So, for ID = 8, I would expect to see 362563740691 and for ID = 10, I would expect to see 362563746711.

enter image description here

Instead, I'm seeing 362785076491 for ID = 8.

enter image description here

Upvotes: 0

Views: 336

Answers (2)

G. Anderson
G. Anderson

Reputation: 5955

Assuming that all of the id/ticket_id pairs in df2 are duplicated as in the example:

df_new=df1.merge(df2[~df2.duplicated()==1], left_on='id', right_on='ticket_id').drop('ticket_id', axis=1)

Upvotes: 1

kaihami
kaihami

Reputation: 815

Your df2 do have a lot of duplicate values. I don't know if you need to keep redundant data, if you don't need to you can drop df2 duplicates

df2.drop_duplicates(inplace = True)
print(df1.merge(df2, left_on = 'id', right_on = 'ticket_id'))

This immediately remove the duplicate rows in the final dataframe.

Another possibility is to remove duplicated rows after the merge.

df1 = df1.merge(df2, left_on = 'id', right_on = 'ticket_id', how = 'inner')                                                                                
df1.drop_duplicates(inplace = True)                                                                                                                        
print(df1) 

Upvotes: 2

Related Questions