Time Ordering of Multiple pandas DataFrames?

Given a DataFrame with an identifier and timestamp columns:

DF1:

id    visited
 X    2019-01-07T01:23:45.678
 X    2019-01-08T01:23:45.678
 Y    2019-01-08T03:44:54.789

And a separate DataFrame with similar attributes:

DF2:

id    visited
 X    2019-01-01T05:23:45.678
 Y    2019-01-08T06:44:54.789
 Y    2019-01-09T09:11:23.476
 Z    2019-01-03T08:12:12.678

How can the first DataFrame (DF1) be filtered for only those rows where the timestamp is before the timestamp in the second DataFrame (DF2) for the corresponding id column?

The resulting dataset would be:

id    visited
 Y    2019-01-08T03:44:54.789

Note: the X entries are filtered out because the timestamp in DF2 is before the timestamp in DF1 (not after).

Thank you in advance for your consideration and response.

Upvotes: 1

Views: 61

Answers (2)

PandaRoux
PandaRoux

Reputation: 21

First, rename the 'visited' column from DF2 to 'visited_2'. Then merge the two data frames on the id. Finally, select data with timestamp 'visited' before 'visited_2' and keep the columns you want.

DF2.rename(columns={'visited': 'visited_2'}, inplace=True)
DF = pd.merge(DF1, DF2, on=['id'])
DF.loc[DF['visited'] < DF['visited_2'], ['id', 'visited']]

Upvotes: 2

BENY
BENY

Reputation: 323316

We can use merge_asof

df2['Have']=True
s=pd.merge_asof(df1.sort_values('visited'),df2.sort_values('visited'),on='visited',by='id',direction = 'forward')
  id                 visited  Have
0  X 2019-01-07 01:23:45.678   NaN
1  X 2019-01-08 01:23:45.678   NaN
2  Y 2019-01-08 03:44:54.789  True
#s=s[s.Have].copy()

Upvotes: 2

Related Questions