cbelmon
cbelmon

Reputation: 1

Create a new dataframe matching IDs based off a date range

I have two pandas dataframes containing IDs and birth dates. The birth dates in dataframe 1 are only an estimation so in order to capture as many similarities as possible I am looking to match IDs in Dataframe 2 that are within 45 days of the birthdate in Dataframe 1. Example as follows:

df_A:
ID     birth_date
A_2    1-1-1945
A_3    1-1-1967
A_4    1-1-2000
A_5    1-1-1994
A_6    2-1-1994

df_B:
ID     birth_date
B_1    2-2-1945
B_2    12-25-1944
B_3    1-5-2000
B_4    1-7-1994

Expected Output:
ID_1     ID_2
A_2      B_1
A_2      B_2
A_4      B_3
A_5      B_4
A_6      B_4

Edited in more example to fully show what I would like to receive.

Upvotes: 0

Views: 42

Answers (2)

cbelmon
cbelmon

Reputation: 1

I think I figured it out myself. Using the answer I found here: cartesian product in pandas

My code looks like this now:

df_A.loc[:, 'key'] = 1
df_B.loc[:, 'key'] = 1

cart = pd.merge(df_A, df_B, on='key') [['df_A_ID', 'df_A_birth_date', 'df_B_ID', 'df_B_birth_date']]

cart = cart[(cart['df_B_birth_date'] < cart['df_A_birth_date'] + pd.to_timedelta(45, unit='d')) 
            & (cart['df_B_birth_date'] > cart['df_A_birth_date'] + pd.to_timedelta(-45, unit='d'))].sort_values(['df_A_ID']).reset_index().drop(columns={'index'})

Upvotes: 0

anky
anky

Reputation: 75100

This is merge_asof with tolerance:

pd.merge_asof(df2.sort_values('birth_date'),df1.sort_values('birth_date'),
on='birth_date',tolerance=pd.Timedelta(45,unit='D'),direction='nearest'
         ,suffixes=('_2','_1')).drop('birth_date',1)

  ID_2 ID_1
0  B_2  A_2
1  B_1  A_2
2  B_3  A_4

Note, please convert the birth_date columns to datetime first:

df1.birth_date=pd.to_datetime(df1.birth_date)
df2.birth_date=pd.to_datetime(df2.birth_date)

Upvotes: 2

Related Questions