mkp
mkp

Reputation: 51

Compare multiple columns of two data frames using pandas

I have two data frames; df1 has Id and sendDate and df2 has Id and actDate. The two df's are not the same shape - df2 is a lookup table. There may be multiple instances of Id.

ex.

df1 = pd.DataFrame({"Id": [1, 1, 2, 3, 2],
                     "sendDate": ["2019-09-24", "2020-09-11", "2018-01-06", "2018-01-06", "2019-09-24"]})

df2 = pd.DataFrame({"Id": [1, 2, 2],
                     "actDate": ["2019-09-24", "2019-09-24", "2020-09-11"]})

I want to add a boolean True/False in df1 to find when df1.Id == df2.Id and df1.sendDate == df2.actDate.

Expected output would add a column to df1:

df1 = pd.DataFrame({"Id": [1, 1, 2, 3, 2],
                         "sendDate": ["2019-09-24", "2020-09-11", "2018-01-06", "2018-01-06", "2019-09-24"],
"Match?": [True, False, False, False, True]})

I'm new to python from R, so please let me know what other info you may need.

Upvotes: 0

Views: 1108

Answers (4)

Nk03
Nk03

Reputation: 14949

A vectorized approach via numpy -

import numpy as np
df1['Match'] = np.where((df1['Id'].isin(df2['Id'])) & (df1['sendDate'].isin(df2['actDate'])),True,False)

Upvotes: 1

Randy
Randy

Reputation: 14849

The .isin() approaches will find values where the ID and date entries don't necessarily appear together (e.g. Id=1 and date=2020-09-11 in your example). You can check for both by doing a .merge() and checking when df2's date field is not null:

df1['match'] = df1.merge(df2, how='left', left_on=['Id', 'sendDate'], right_on=['Id', 'actDate'])['actDate'].notnull()

Upvotes: 1

Henry Ecker
Henry Ecker

Reputation: 35626

Use isin and boolean indexing

import pandas as pd

df1 = pd.DataFrame({"Id": [1, 1, 2, 3, 2],
                    "sendDate": ["2019-09-24", "2020-09-11",
                                 "2018-01-06", "2018-01-06",
                                 "2019-09-24"]})

df2 = pd.DataFrame({"Id": [1, 2, 2],
                    "actDate": ["2019-09-24", "2019-09-24", "2020-09-11"]})

df1['Match'] = (df1['Id'].isin(df2['Id'])) & (df1['sendDate'].isin(df2['actDate']))
print(df1)

Output:

   Id    sendDate  Match
0   1  2019-09-24   True
1   1  2020-09-11   True
2   2  2018-01-06  False
3   3  2018-01-06  False
4   2  2019-09-24   True

Upvotes: 1

Yehuda
Yehuda

Reputation: 1893

You can use .isin():

df1['id_bool'] = df1.Id.isin(df2.Id)
df1['date_bool'] = df1.sendDate.isin(df2.actDate)

Check out the documentation here.

Upvotes: 0

Related Questions