Rafael
Rafael

Reputation: 671

Pandas Time Series: Remove Rows Per ID

I have a Pandas dataframe of the form:

Date       ID     Temp
2019/03/27 1       23
2019/04/27 2       32
2019/04/27 1       42
2019/04/28 1       41
2019/01/27 2       33
2019/08/27 2       23

What I need to do?

Select Rows which are at least 30 days old from their latest measurement for each id.

i.e. the latest date for Id = 2 is 2019/08/27, so for ID =2 I need to select rows which are at least 30 days older. So, the row with 2019/08/27 for ID=2 will itself be dropped.

Similarly, the latest date for ID = 1 is 2019/04/28. This means I can select rows for ID =1 only if the date is less than 2019/03/28 (30 days older). So, the row 2019/04/27 with ID=1 will be dropped.

How to do this in Pandas. Any help is greatly appreciated.

Thank you.

Final dataframe will be:

Date       ID     Temp
2019/03/27 1       23
2019/04/27 2       32
2019/01/27 2       33

Upvotes: 0

Views: 43

Answers (1)

BENY
BENY

Reputation: 323316

In your case using groupby + transform('last') and filter the original df

Yourdf=df[df.Date<df.groupby('ID').Date.transform('last')-pd.Timedelta('30 days')].copy()
        Date  ID  Temp
0 2019-03-27   1    23
1 2019-04-27   2    32
4 2019-01-27   2    33

Notice I am adding the .copy at the end to prevent the setting copy error.

Upvotes: 5

Related Questions