Reputation: 20342
I am trying to merge two data frames and eliminate dupes.
Here is DF#1:
import pandas as pd
data1 = {'id':['168'],'group_id':['360002136432'],'ticket_form_id':['360000159712']}
df1 = pd.DataFrame(data1)
print(df1)
Here is DF #2
data2 = {'id':['362936613051','362936613051','362936613051'],'ticket_id':['168','168','168']}
df2 = pd.DataFrame(data2)
print(df2)
I am trying to merge, or consolidate, DF#1 and DF#2, so it looks like this.
id group_id ticket_form_id ID
168 360002136432 360000159712 362936613051
It would be some kind of inner join (I think) between DF#1.id and DF#2.ticket_id, but I keep getting a bunch of dupes in the merged data frame. How can I eliminate dupes in the merged data frame.
So, for ID = 8, I would expect to see 362563740691 and for ID = 10, I would expect to see 362563746711.
Instead, I'm seeing 362785076491 for ID = 8.
Upvotes: 0
Views: 336
Reputation: 5955
Assuming that all of the id/ticket_id pairs in df2 are duplicated as in the example:
df_new=df1.merge(df2[~df2.duplicated()==1], left_on='id', right_on='ticket_id').drop('ticket_id', axis=1)
Upvotes: 1
Reputation: 815
Your df2 do have a lot of duplicate values. I don't know if you need to keep redundant data, if you don't need to you can drop df2 duplicates
df2.drop_duplicates(inplace = True)
print(df1.merge(df2, left_on = 'id', right_on = 'ticket_id'))
This immediately remove the duplicate rows in the final dataframe.
Another possibility is to remove duplicated rows after the merge.
df1 = df1.merge(df2, left_on = 'id', right_on = 'ticket_id', how = 'inner')
df1.drop_duplicates(inplace = True)
print(df1)
Upvotes: 2