L. Scheipers
L. Scheipers

Reputation: 39

Remove dates from a column that are not within range

I’ve two imported Panda DataFrames out of Excel (df1 and df2). Df1 represents the dates of replacement consisting out of a column with Dates and a column with Notes (200 rows). Df2 represents the dates when a check was performed (40 rows).

I would like to filter df1 (or generate a new table (df1')), that all dates of df1 which differ less than 5 days with the dates of df2 will be deleted in df1.

As a check is performed, we could say that the component was not replaced within a margin of 10 days.

e.g.

df1
22/04/2017
23/04/2017
07/06/2017
20/08/2017

df2 
21/04/2017

df1' 
07/06/2017
20/08/2017

Upvotes: 1

Views: 265

Answers (1)

cs95
cs95

Reputation: 402503

You can perform datetime subtraction with numpy broadcasting and filter df1 accordingly.

df1
           A
0 2017-04-22
1 2017-04-23
2 2017-07-06
3 2017-08-20

df2 
           A
0 2017-04-21

df1.A = pd.to_datetime(df1.A)    # convert to datetime first  
df2.A = pd.to_datetime(df2.A)  
df1[((df1.values[:, None] - df2.values) / pd.Timedelta(days=1) > 5).all(1)]

           A
2 2017-07-06
3 2017-08-20

For your data, this will generate 8000 elements on broadcasted subtraction, which certainly is manageable. Though note for much larger data, this results in a memory blowup (a pricey tradeoff for the high performance).

Upvotes: 2

Related Questions