Reputation: 2238
I'm trying got find IDs which have no dates below a certain timestamp. In other words, I am trying to find dates which are above a certain timestamp.
The below code works, but is there is a better way to do the same procedure?
#pd.set_option('display.max_rows', 1000)
import pandas as pd
from datetime import date, timedelta
last_y_days = pd.datetime.today() - timedelta(days=60)
tmp_df = df[['ID','TIMESTAMP']].drop_duplicates()
tmp_df['result'] = tmp_df['TIMESTAMP'] < last_y_days
foobar = tmp_df.groupby('ID')['result'].unique().reset_index()
foobar[foobar['result'].apply(lambda x: True not in x)]
If we assume this to be the data, I want those IDs which have no timestamps before the last 60 days. In this case, the only answer is 1
ID TIMESTAMP
1 1 2020-08-26
3 2 2020-04-18
4 2 2020-03-31
7 2 2020-01-10
10 2 2020-05-13
14 2 2020-02-24
16 2 2020-02-20
19 2 2020-08-03
34 3 2020-09-29
54 3 2020-08-14
55 3 2020-10-01
70 4 2020-01-25
72 4 2020-04-22
73 4 2020-09-01
75 4 2020-03-03
76 4 2020-07-21
79 4 2020-04-20
81 4 2020-04-28
83 4 2020-08-22
85 4 2020-06-03
Upvotes: 1
Views: 44
Reputation: 862671
Use numpy.setdiff1d
with filtered ID
in DataFrame.loc
:
df['TIMESTAMP'] = pd.to_datetime(df['TIMESTAMP'])
from datetime import date, timedelta
last_y_days = pd.datetime.today() - timedelta(days=60)
print (last_y_days)
ids = np.setdiff1d(df['ID'], df.loc[df['TIMESTAMP'] < last_y_days, 'ID'].unique()).tolist()
print (ids)
[1, 3]
Or test if at least one True
per groups by GroupBy.any
for mask and then filter not matched index values:
m = (df['TIMESTAMP'] < last_y_days).groupby(df['ID']).any()
ids = m.index[~m].tolist()
print (ids)
[1, 3]
Upvotes: 3