How to check the duration of a process as blocks of rows?

Question

I need to check if the column val1 has values that are greater than 5, and that these values are maintained minimally 30 minutes. Then I need to know the first row id for the block of rows that correspond to the case when the values greater than 5 are maintained at least 30 minutes.

This is the DataFrame df:

date_time             val1
10-12-2018 20:30:00   1
10-12-2018 20:35:00   6
10-12-2018 20:38:00   7
10-12-2018 20:45:00   6
10-12-2018 20:58:00   4
10-12-2018 21:15:00   6   
10-12-2018 21:28:00   8
10-12-2018 21:30:00   7
10-12-2018 22:10:00   6
10-12-2018 22:15:00   4

In this example, we have two blocks of rows when the values of val1 are greater than 5:

Block 1:

10-12-2018 20:35:00   6
10-12-2018 20:38:00   7
10-12-2018 20:45:00   6

Block 2:

10-12-2018 21:15:00   6   
10-12-2018 21:28:00   8
10-12-2018 21:30:00   7
10-12-2018 22:10:00   6

However, the Block 1 should be discarded because the duration is 10 minutes, which is lower than 30 minutes. In the Block 2 the duration is 55 minutes, which is greater than 30 minutes and therefore fits the criteria.

The first row id should be 5 in this example (id of this row in Block 1: 10-12-2018 21:15:00 6)

This is how I tried to solve the task, however my code does not consider that the rows can be merged in blocks, because the values val1 can grow up and down.

c = "val1"
df.date_time=pd.to_datetime(df.date_time)
maintained = df[df[c]>5][['date_time']]
if len(maintained)>0:
     start = maintained["date_time"].iloc[0]
     end = maintained["date_time"].iloc[len(maintained)-1]
     if (abs(end-start).total_seconds()/60 > 30):
        print(True)
     else:
        print(False)
else:
     print(False)

BENY · Accepted Answer

Here is one method , create you condition one by one , first should group all value more than 5 to different groups , which using cumsum, then we transform from each group get the min and max difference, and filter the group by both condition

s=df.val1.lt(5)
df1=df[~s].copy()
s1=df1.groupby(s.cumsum()).date_time.transform(lambda x : x.max()-x.min()).dt.seconds
yourdf=df1.loc[(s1>1800)]
yourdf
Out[174]: 
            date_time  val1
5 2018-10-12 21:15:00     6
6 2018-10-12 21:28:00     8
7 2018-10-12 21:30:00     7
8 2018-10-12 22:10:00     6

If you have more blocks fit the condition

Save them in dict

d={x : y for x , y in yourdf.groupby(s.cumsum())}

How to check the duration of a process as blocks of rows?

Answers (1)

Related Questions