tandem
tandem

Reputation: 2238

check the difference in dates pandas and only keep certain IDs

I'm trying got find IDs which have no dates below a certain timestamp. In other words, I am trying to find dates which are above a certain timestamp.

The below code works, but is there is a better way to do the same procedure?

#pd.set_option('display.max_rows', 1000)
import pandas as pd
from datetime import date, timedelta
last_y_days = pd.datetime.today() - timedelta(days=60)
tmp_df = df[['ID','TIMESTAMP']].drop_duplicates()
tmp_df['result'] = tmp_df['TIMESTAMP'] < last_y_days
foobar = tmp_df.groupby('ID')['result'].unique().reset_index()
foobar[foobar['result'].apply(lambda x: True not in x)]

If we assume this to be the data, I want those IDs which have no timestamps before the last 60 days. In this case, the only answer is 1

    ID  TIMESTAMP
1   1   2020-08-26
3   2   2020-04-18
4   2   2020-03-31
7   2   2020-01-10
10  2   2020-05-13
14  2   2020-02-24
16  2   2020-02-20
19  2   2020-08-03
34  3   2020-09-29
54  3   2020-08-14
55  3   2020-10-01
70  4   2020-01-25
72  4   2020-04-22
73  4   2020-09-01
75  4   2020-03-03
76  4   2020-07-21
79  4   2020-04-20
81  4   2020-04-28
83  4   2020-08-22
85  4   2020-06-03

Upvotes: 1

Views: 44

Answers (1)

jezrael
jezrael

Reputation: 862671

Use numpy.setdiff1d with filtered ID in DataFrame.loc:

df['TIMESTAMP'] = pd.to_datetime(df['TIMESTAMP'])

from datetime import date, timedelta
last_y_days = pd.datetime.today() - timedelta(days=60)
print (last_y_days)

ids = np.setdiff1d(df['ID'], df.loc[df['TIMESTAMP'] < last_y_days, 'ID'].unique()).tolist()
print (ids)
[1, 3]
    

Or test if at least one True per groups by GroupBy.any for mask and then filter not matched index values:

m = (df['TIMESTAMP'] < last_y_days).groupby(df['ID']).any()

ids = m.index[~m].tolist()
print (ids)
[1, 3]

Upvotes: 3

Related Questions