Reputation: 39
I’ve two imported Panda DataFrames out of Excel (df1 and df2). Df1 represents the dates of replacement consisting out of a column with Dates and a column with Notes (200 rows). Df2 represents the dates when a check was performed (40 rows).
I would like to filter df1 (or generate a new table (df1')), that all dates of df1 which differ less than 5 days with the dates of df2 will be deleted in df1.
As a check is performed, we could say that the component was not replaced within a margin of 10 days.
e.g.
df1
22/04/2017
23/04/2017
07/06/2017
20/08/2017
df2
21/04/2017
df1'
07/06/2017
20/08/2017
Upvotes: 1
Views: 265
Reputation: 402503
You can perform datetime subtraction with numpy broadcasting and filter df1
accordingly.
df1
A
0 2017-04-22
1 2017-04-23
2 2017-07-06
3 2017-08-20
df2
A
0 2017-04-21
df1.A = pd.to_datetime(df1.A) # convert to datetime first
df2.A = pd.to_datetime(df2.A)
df1[((df1.values[:, None] - df2.values) / pd.Timedelta(days=1) > 5).all(1)]
A
2 2017-07-06
3 2017-08-20
For your data, this will generate 8000
elements on broadcasted subtraction, which certainly is manageable. Though note for much larger data, this results in a memory blowup (a pricey tradeoff for the high performance).
Upvotes: 2