kms
kms

Reputation: 2024

Pandas merge two dataframes with one to many relationship

I am trying to merge two pandas DataFrames with one of many relationship.

import pandas as pd

df1 = pd.DataFrame({'name': ['AA', 'BB', 'CC'],
                    'col1': [1, 2, 3],
                    'col2': [1, 2, 3] })

df2 = pd.DataFrame({'name': ['AA', 'AA', 'BB'],
                    'col1': [1, 2, 3],
                    'col2': [1, 2, 3] })

df_merged = pd.merge(
                     df1, 
                     df2, 
                     left_on = 'name',
                     right_on = 'name',
                     how = "inner"
                    )

Two questions.

  1. How do I join the two DataFrames using pd.merge without inserting new rows in df1? Shape of df1 needs not change. name is unique in df1.

For rows with one-to-many relationship, I'd like join the first row from df2.

  1. When I merge the two DataFrames, it creates new columns - col1.x, col2.x, col1.y, col2.y? I'd like only copy of those columns.

Upvotes: 0

Views: 2503

Answers (1)

wwnde
wwnde

Reputation: 26676

Use left join and drop duplicates

df1.merge(df2, how='left', on='name').drop_duplicates(subset='name',keep='first')

Upvotes: 1

Related Questions