Reputation: 575
I referenced another stackoverflow, but the value came out weird and I asked again.
like
compare 2 columns in different dataframes
df1
Name date
A 2019-01-24
A 2019-02-14
B 2018-05-12
B 2019-07-21
C 2016-04-24
C 2017-09-11
D 2020-11-24
df2
Name date2 value
A 2019-01-24 123124
A 2019-02-14 675756
B 2018-05-11 624622
B 2019-07-20 894321
C 2016-04-23 321032190
C 2017-09-11 201389
I would like to compare the name and date of df1 and the name and date2 of df2, and if it matches, add value to the new column of df1.
so I using
df1['new'] = df1.merge(df2, left_on = ['Name','date'], right_on = ['Name','date2'])['value']
When I applied this to my actual data, I found that strange values(Not what I want, it's weird) were created in the new column. What's wrong with my code?
++++(after answer)
Looking at the answer of @jezrael below, it would be good to apply it according to the characteristics of the data to be applied. In the case of me, there were many duplicate data having the same day, so it could not be applied with simple left_on and right_on.
Upvotes: 2
Views: 70
Reputation: 863611
You can filter only necessary columns in df2
by list and instead left_on
and right_on
is used rename
with on
parameter for avoid same columns date
and date2
in output and also is used left join
:
df = df1.merge(df2[['Name','date2', 'value']].rename(columns={'date2':'date'),
on = ['Name','date'],
how='left')
First solution with left join
:
df1['new'] = df1.merge(df2,
left_on=['Name','date'],
right_on=['Name','date2'],
how='left')['value']
I think reason why weird output should be created default index after merge with inner join, so if assign back rows should not match. If use left join then index of left DataFrame in not changed, so assign column working correct.
Upvotes: 2