Reputation: 4408
I have two datasets:
df1 and df2, where I would like the columns to merge together from df1, if the datetimes are within 20 seconds of df2
df1
Connect Ended
4/6/2020 1:15:21 PM 4/6/2020 2:05:18 PM
3/31/2020 11:00:08 AM 3/31/2020 11:00:10 AM
4/1/2020 10:00:05 PM 4/1/2020 12:00:05 PM
df2
Start End
4/6/2020 1:15:21 PM 4/6/2020 2:05:18 PM
3/31/2020 11:00:10 AM 3/31/2020 11:00:14 AM
Desired Output:
df3
Match_Start1 Match_End1 Match_Start2 Match_End2
4/6/2020 1:15:21 PM 4/6/2020 2:05:18 PM 4/6/2020 1:15:21 PM 4/6/2020 2:05:18 PM
3/31/2020 11:00:08 AM 3/31/2020 11:00:10 AM 3/31/2020 11:00:10 AM 3/31/2020 11:00:14 AM
df4 (unmatched)
Unmatched_Start Unmatched_end
4/1/2020 10:00:05 PM 4/1/2020 12:00:05 PM
Dput:
df1
:
' Connect Ended\n0 4/6/2020 1:15:21 PM 4/6/2020 2:05:18 PM\n1
3/31/2020 11:08:08 AM 3/31/2020 11:00:10 AM\n2 4/1/2020 10:00:05 PM 4/1/2020 12:00:05 PM'
df2
Out[117]:
' Start End\n0 4/6/2020 1:15:21 PM 4/6/2020 2:05:18 PM\n1
3/31/2020 11:08:08 AM 3/31/2020 11:00:14 AM\n2 NaN NaN'
What I am thinking:
df2 = pd.merge_asof(df1, df2, on="Connect", by = "Ended",
tolerance=pd.Timedelta('20s'), direction='backward')
However, how do I incorporate the condition of the 20 seconds, as well as show the unmatched dataset?
Any suggestion is appreciated
Upvotes: 1
Views: 180
Reputation: 28709
you need to read in the data and convert to datetime format - i read in the data with clipboard and parsed the dates there. Secondly, you need to sort the data by the key(in this case the key is 'Connect' for df1 and 'Start' for df2). after that pandas merge_asof should suffice. note that the merger can only occur on one key, not multiple :
sort the dataframes
df1 = df1.sort_values(['Connect','Ended'])
df2 = df2.sort_values(['Start','End'])
merge the dataframes
merger = pd.merge_asof(df1,df2,
left_on='Connect',
right_on='Start',
tolerance = pd.Timedelta('20s'),
direction='forward')
merger
Connect Ended Start End
0 2020-03-31 11:00:08 2020-03-31 11:00:10 2020-03-31 11:00:10 2020-03-31 11:00:14
1 2020-04-01 22:00:05 2020-04-01 12:00:05 NaT NaT
2 2020-04-06 13:15:21 2020-04-06 14:05:18 2020-04-06 13:15:21 2020-04-06 14:05:18
should be easy to pick the matched and unmatched rows :
matched = merger.dropna()
matched
Connect Ended Start End
0 2020-03-31 11:00:08 2020-03-31 11:00:10 2020-03-31 11:00:10 2020-03-31 11:00:14
2 2020-04-06 13:15:21 2020-04-06 14:05:18 2020-04-06 13:15:21 2020-04-06 14:05:18
unmatched = merger.loc[merger.isna().any(axis=1)]
unmatched
Connect Ended Start End
1 2020-04-01 22:00:05 2020-04-01 12:00:05 NaT NaT
hope it suffices... the docs have more examples to guide u thru if u get stomped
Upvotes: 3