Condition on all rows of a groupby

Question

Concerning this type of dataframe:

import pandas as pd
import datetime
df = pd.DataFrame({'ID': [1,1,1,1,2,2,2,3],
                   'Time': [datetime.date(2019, 12, 1), datetime.date(2019, 12, 5),datetime.date(2019, 12, 8),datetime.date(2019, 8, 4),datetime.date(2019, 11, 4),datetime.date(2019, 11, 4),datetime.date(2019, 11, 3),datetime.date(2019, 12, 20)],
                   'Value':[2,2,2,50,7,100,7,5]})

    ID  Time        Value
0   1   2019-12-01  2
1   1   2019-12-05  2
2   1   2019-12-08  2
3   1   2019-08-04  50
4   2   2019-11-04  7
5   2   2019-11-04  100
6   2   2019-11-03  7
7   3   2019-12-20  5

I am intersted only in the 3 latest values (regarding the time)

and

I would like to keep only the IDs where these 3 values are < 10.

So my desired output will look like this:

    ID
0   1

Indeed the value 50 for the first ID is the fourth last value, so it's not interesting.

rpanai · Accepted Answer

Within a groupby:

I sort the group by time
use a boolean to determine if the condition <10 is satisfied or not
Take the last 3 values only and sum the boolean defined above
Check if this number is exactly 3

grp = df.groupby("ID")\
        .apply(lambda x: 
         x.sort_values("Time")["Value"].lt(10)[-3:].sum()==3)

grp[grp]

ID
1    True
dtype: bool

Condition on all rows of a groupby

Answers (2)

Related Questions