shruti
shruti

Reputation: 459

Merge two dataframe with duplicate entries but with different values

I will be able to explain with example what I need to achieve: enter image description here

Though both dataframe have duplicates, values of the column 'first_name' are different. Now I want to merge both, with output something like this:

enter image description here

df_a.merge(df_b, on='subject_id', how='left')

pandas merge will not give this output because of duplicates. how can I get my desired output or any other suggestions?

Upvotes: 1

Views: 50

Answers (1)

jezrael
jezrael

Reputation: 863166

I believe you need helper coumns created by GroupBy.cumcount and used it for merge, last remove it:

df_a['g'] = df_a.groupby('subject_id').cumcount()
df_b['g'] = df_b.groupby('subject_id').cumcount()
df_a.merge(df_b, on=['subject_id', 'g'], how='left').drop('g', axis=1)

Upvotes: 2

Related Questions