Time Ordering of Multiple pandas DataFrames?

Question

Given a DataFrame with an identifier and timestamp columns:

DF1:

id    visited
 X    2019-01-07T01:23:45.678
 X    2019-01-08T01:23:45.678
 Y    2019-01-08T03:44:54.789

And a separate DataFrame with similar attributes:

DF2:

id    visited
 X    2019-01-01T05:23:45.678
 Y    2019-01-08T06:44:54.789
 Y    2019-01-09T09:11:23.476
 Z    2019-01-03T08:12:12.678

How can the first DataFrame (DF1) be filtered for only those rows where the timestamp is before the timestamp in the second DataFrame (DF2) for the corresponding id column?

The resulting dataset would be:

id    visited
 Y    2019-01-08T03:44:54.789

Note: the X entries are filtered out because the timestamp in DF2 is before the timestamp in DF1 (not after).

Thank you in advance for your consideration and response.

BENY · Accepted Answer

We can use merge_asof

df2['Have']=True
s=pd.merge_asof(df1.sort_values('visited'),df2.sort_values('visited'),on='visited',by='id',direction = 'forward')
  id                 visited  Have
0  X 2019-01-07 01:23:45.678   NaN
1  X 2019-01-08 01:23:45.678   NaN
2  Y 2019-01-08 03:44:54.789  True
#s=s[s.Have].copy()

Time Ordering of Multiple pandas DataFrames?

Answers (2)

Related Questions