Reputation: 671
I have a Pandas dataframe of the form:
Date ID Temp
2019/03/27 1 23
2019/04/27 2 32
2019/04/27 1 42
2019/04/28 1 41
2019/01/27 2 33
2019/08/27 2 23
What I need to do?
Select Rows which are at least 30 days old from their latest measurement for each id.
i.e. the latest date for Id = 2
is 2019/08/27
, so for ID =2
I need to select rows which are at least 30 days older. So, the row with 2019/08/27
for ID=2
will itself be dropped.
Similarly, the latest date for ID = 1 is 2019/04/28
. This means I can select rows for ID =1
only if the date is less than 2019/03/28
(30 days older). So, the row 2019/04/27
with ID=1
will be dropped.
How to do this in Pandas. Any help is greatly appreciated.
Thank you.
Final dataframe will be:
Date ID Temp
2019/03/27 1 23
2019/04/27 2 32
2019/01/27 2 33
Upvotes: 0
Views: 43
Reputation: 323316
In your case using groupby
+ transform('last')
and filter the original df
Yourdf=df[df.Date<df.groupby('ID').Date.transform('last')-pd.Timedelta('30 days')].copy()
Date ID Temp
0 2019-03-27 1 23
1 2019-04-27 2 32
4 2019-01-27 2 33
Notice I am adding the .copy
at the end to prevent the setting copy error.
Upvotes: 5