Adding columns after comparing values in 2 dataframes with different lengths

Question

I referenced another stackoverflow, but the value came out weird and I asked again.

like

compare 2 columns in different dataframes

df1
Name  date
A     2019-01-24
A     2019-02-14
B     2018-05-12
B     2019-07-21
C     2016-04-24
C     2017-09-11
D     2020-11-24

df2
Name  date2        value
A     2019-01-24  123124
A     2019-02-14  675756
B     2018-05-11  624622
B     2019-07-20  894321
C     2016-04-23  321032190
C     2017-09-11  201389

I would like to compare the name and date of df1 and the name and date2 of df2, and if it matches, add value to the new column of df1.

so I using

df1['new'] = df1.merge(df2, left_on = ['Name','date'], right_on = ['Name','date2'])['value']

When I applied this to my actual data, I found that strange values(Not what I want, it's weird) were created in the new column. What's wrong with my code?

++++(after answer)

Looking at the answer of @jezrael below, it would be good to apply it according to the characteristics of the data to be applied. In the case of me, there were many duplicate data having the same day, so it could not be applied with simple left_on and right_on.

jezrael · Accepted Answer

You can filter only necessary columns in df2 by list and instead left_on and right_on is used rename with on parameter for avoid same columns date and date2 in output and also is used left join:

df = df1.merge(df2[['Name','date2', 'value']].rename(columns={'date2':'date'), 
               on = ['Name','date'], 
               how='left')

First solution with left join:

df1['new'] = df1.merge(df2,
                       left_on=['Name','date'],
                       right_on=['Name','date2'],
                       how='left')['value']

I think reason why weird output should be created default index after merge with inner join, so if assign back rows should not match. If use left join then index of left DataFrame in not changed, so assign column working correct.

Adding columns after comparing values in 2 dataframes with different lengths

Answers (1)

Related Questions