thePandasFriend
thePandasFriend

Reputation: 113

How to only use the first match when merging Pandas Dataframes?

I have two dataframes: 'data' and 'working'. I am trying to merge data into working on the ID to get the Name into working

Data:
ID  Name
a1  a1_Name
a1  a1_Name1

Working:
ID  SomeValues 
a1  123
a1  456

I want the end result to match only on the first found ID each time, but as of now when I do something like

working = pd.merge(working, data, left_on="ID", right_on="ID", sort=False)

I get this:

Working:
ID  SomeValues Name 
a1  123        a1_Name
a1  456        a1_Name1

and it just alternates between 'a1_Name' and 'a1_Name1'.

I would like it to output:

Working:
ID  SomeValues Name 
a1  123        a1_Name
a1  456        a1_Name

Upvotes: 2

Views: 1800

Answers (2)

anky
anky

Reputation: 75120

You can first drop duplicates and then map:

(working.assign(Name=working['ID']
.map(data.drop_duplicates('ID').set_index('ID')['Name'])))

   ID  SomeValues     Name
0  a1         123  a1_Name
1  a1         456  a1_Name

Note: you have to assign this to the same dataframe or a new dataframe as you want.

Upvotes: 1

Mayank Porwal
Mayank Porwal

Reputation: 34086

Like this:

In [3004]: Data['new_name'] = Data.groupby('ID')['Name'].transform('first')
In [3008]: Data.merge(Working, on='ID')[['ID','new_name','SomeValues']].drop_duplicates() 
Out[3008]: 
   ID new_name  SomeValues
0  a1  a1_Name         123
1  a1  a1_Name         456

Upvotes: 3

Related Questions