Reputation: 17953
Given a DataFrame
with an identifier and timestamp columns:
DF1:
id visited
X 2019-01-07T01:23:45.678
X 2019-01-08T01:23:45.678
Y 2019-01-08T03:44:54.789
And a separate DataFrame with similar attributes:
DF2:
id visited
X 2019-01-01T05:23:45.678
Y 2019-01-08T06:44:54.789
Y 2019-01-09T09:11:23.476
Z 2019-01-03T08:12:12.678
How can the first DataFrame (DF1) be filtered for only those rows where the timestamp is before the timestamp in the second DataFrame (DF2) for the corresponding id
column?
The resulting dataset would be:
id visited
Y 2019-01-08T03:44:54.789
Note: the X
entries are filtered out because the timestamp in DF2 is before the timestamp in DF1 (not after).
Thank you in advance for your consideration and response.
Upvotes: 1
Views: 61
Reputation: 21
First, rename the 'visited' column from DF2 to 'visited_2'. Then merge the two data frames on the id. Finally, select data with timestamp 'visited' before 'visited_2' and keep the columns you want.
DF2.rename(columns={'visited': 'visited_2'}, inplace=True)
DF = pd.merge(DF1, DF2, on=['id'])
DF.loc[DF['visited'] < DF['visited_2'], ['id', 'visited']]
Upvotes: 2
Reputation: 323316
We can use merge_asof
df2['Have']=True
s=pd.merge_asof(df1.sort_values('visited'),df2.sort_values('visited'),on='visited',by='id',direction = 'forward')
id visited Have
0 X 2019-01-07 01:23:45.678 NaN
1 X 2019-01-08 01:23:45.678 NaN
2 Y 2019-01-08 03:44:54.789 True
#s=s[s.Have].copy()
Upvotes: 2