Lynn
Lynn

Reputation: 4408

Join two datasets in Python if datetimes are within certain time of one another, as well as create an 'unmatched' dataset

I have two datasets:

df1 and df2, where I would like the columns to merge together from df1, if the datetimes are within 20 seconds of df2

df1





Connect                 Ended


4/6/2020 1:15:21 PM     4/6/2020 2:05:18 PM

3/31/2020 11:00:08 AM   3/31/2020 11:00:10 AM

4/1/2020 10:00:05 PM    4/1/2020 12:00:05 PM

df2

Start                   End


4/6/2020 1:15:21 PM     4/6/2020 2:05:18 PM

3/31/2020 11:00:10 AM   3/31/2020 11:00:14 AM

Desired Output:

df3

Match_Start1             Match_End1                     Match_Start2              Match_End2



4/6/2020 1:15:21 PM     4/6/2020 2:05:18 PM          4/6/2020 1:15:21 PM     4/6/2020 2:05:18 PM

3/31/2020 11:00:08 AM   3/31/2020 11:00:10 AM        3/31/2020 11:00:10 AM   3/31/2020 11:00:14 AM    

df4 (unmatched)

Unmatched_Start         Unmatched_end  



4/1/2020 10:00:05 PM    4/1/2020 12:00:05 PM

Dput:

df1

   :
   '  Connect   Ended\n0    4/6/2020 1:15:21 PM     4/6/2020 2:05:18 PM\n1  
   3/31/2020 11:08:08 AM  3/31/2020 11:00:10 AM\n2   4/1/2020 10:00:05 PM    4/1/2020 12:00:05    PM'

df2

  Out[117]:
  '    Start    End\n0    4/6/2020 1:15:21 PM    4/6/2020 2:05:18 PM\n1     
  3/31/2020 11:08:08 AM  3/31/2020 11:00:14 AM\n2                    NaN                    NaN'

What I am thinking:

df2 = pd.merge_asof(df1, df2, on="Connect", by = "Ended", 
tolerance=pd.Timedelta('20s'), direction='backward')

However, how do I incorporate the condition of the 20 seconds, as well as show the unmatched dataset?

Any suggestion is appreciated

Upvotes: 1

Views: 180

Answers (1)

sammywemmy
sammywemmy

Reputation: 28709

you need to read in the data and convert to datetime format - i read in the data with clipboard and parsed the dates there. Secondly, you need to sort the data by the key(in this case the key is 'Connect' for df1 and 'Start' for df2). after that pandas merge_asof should suffice. note that the merger can only occur on one key, not multiple :

sort the dataframes

df1 = df1.sort_values(['Connect','Ended'])
df2 = df2.sort_values(['Start','End'])

merge the dataframes

merger = pd.merge_asof(df1,df2,
                       left_on='Connect',
                       right_on='Start',
                       tolerance = pd.Timedelta('20s'),
                       direction='forward')

merger

     Connect                   Ended            Start              End
0   2020-03-31 11:00:08 2020-03-31 11:00:10 2020-03-31 11:00:10 2020-03-31 11:00:14
1   2020-04-01 22:00:05 2020-04-01 12:00:05        NaT             NaT
2   2020-04-06 13:15:21 2020-04-06 14:05:18 2020-04-06 13:15:21 2020-04-06 14:05:18

should be easy to pick the matched and unmatched rows :

matched = merger.dropna()
matched


         Connect              Ended                    Start         End
0   2020-03-31 11:00:08 2020-03-31 11:00:10 2020-03-31 11:00:10 2020-03-31 11:00:14
2   2020-04-06 13:15:21 2020-04-06 14:05:18 2020-04-06 13:15:21 2020-04-06 14:05:18

unmatched = merger.loc[merger.isna().any(axis=1)]
unmatched

    Connect               Ended           Start End
1   2020-04-01 22:00:05 2020-04-01 12:00:05 NaT NaT

hope it suffices... the docs have more examples to guide u thru if u get stomped

Upvotes: 3

Related Questions