getting total duration based on continuous occurrence of a condition in R dataframe

Question

I have a dataframe like this. I want to find the duration for whcih either v1 or v2 is above 110.

timestamp              v1     v2
    14-05-2019 04:28    112.2   111.0
    14-05-2019 04:30    112.2   110.9
    14-05-2019 04:39    101.4   101.8
    14-05-2019 04:40    108.0   108.8
    14-05-2019 04:45    101.1   101.5
    14-05-2019 04:46    100.8   101.2
    14-05-2019 05:32    111.6   111.5
    14-05-2019 05:36    111.5   111.5
    14-05-2019 05:39    111.5   111.5
    14-05-2019 05:41    111.5   111.5
    14-05-2019 05:46    111.5   111.4
    14-05-2019 05:46    111.5   111.3
    14-05-2019 05:47    111.5   111.3
    14-05-2019 05:51    111.2   111.2
    14-05-2019 05:56    111.2   111.2
    14-05-2019 05:57    111.2   111.2

Mycode :

str = 0
end = 0
dur = 0
diff = 0
for (i in (1:norws(x))) {
  if((x['v1'][i,] >=110) || (x['v2'][i,] >=110)){
    if((str !=0) && (i-str == 1)){
      str = i}else{
        str = i
        end = i - 1
      }}
  if((str



I want to get the duration of continous data where either v1 or v2 is greater than 110. Also if 
both start and end are same,them that is not considered or the difference is 0 sec.
  I will get two sets here :

14-05-2019 04:28    112.2   111.0
14-05-2019 04:30    112.2   110.9


here the time is 2 minute 14-05-2019 04:30 - 14-05-2019 04:28
Similarly, 

14-05-2019 05:32    111.6   111.5
14-05-2019 05:36    111.5   111.5
14-05-2019 05:39    111.5   111.5
14-05-2019 05:41    111.5   111.5
14-05-2019 05:46    111.5   111.4
14-05-2019 05:46    111.5   111.3
14-05-2019 05:47    111.5   111.3
14-05-2019 05:51    111.2   111.2
14-05-2019 05:56    111.2   111.2
14-05-2019 05:57    111.2   111.2


here the time is 25 minute. ie, 14-05-2019 05:57 - 14-05-2019 05:32 
So, totally I get : 27 minute

Wimpel · Accepted Answer

Here is a data.table approach of your question.

It uses data.table::rleid() to create groups based on the condition v1 or v2 > 110. It then summarises on thw rows where this condition is valid, by subtracting the first timestamp of each group from the last timestamp of each group. This results in a column duration of difftimes by group.

Alternatively, you can calculate the entire sum of all duration. format() is used to output the answer as a string, in stead of a difftime.

sample data

library(data.table)
DT <- fread("timestamp              v1     v2
14-05-2019T04:28    112.2   111.0
14-05-2019T04:30    112.2   110.9
14-05-2019T04:39    101.4   101.8
14-05-2019T04:40    108.0   108.8
14-05-2019T04:45    101.1   101.5
14-05-2019T04:46    100.8   101.2
14-05-2019T05:32    111.6   111.5
14-05-2019T05:36    111.5   111.5
14-05-2019T05:39    111.5   111.5
14-05-2019T05:41    111.5   111.5
14-05-2019T05:46    111.5   111.4
14-05-2019T05:46    111.5   111.3
14-05-2019T05:47    111.5   111.3
14-05-2019T05:51    111.2   111.2
14-05-2019T05:56    111.2   111.2
14-05-2019T05:57    111.2   111.2")

#create timestamps
DT[, timestamp := as.POSIXct( timestamp, format = "%d-%m-%YT%H:%M" )]

code

#create groups based on v1|v2 > 110
DT[, group_id := rleid( v1 > 110 | v2 > 110 ) ][]
#group by group_id, only on rows where v1 or v2 > 110
DT[ v1 > 110 | v2 > 110, ][, .(duration = max(timestamp) - min(timestamp) ), by = .(group_id)]

output

#    group_id duration
# 1:        1   2 mins
# 2:        3  25 mins

alternative summary

replace last line of code with:

format( 
  sum( 
    DT[ v1 > 110 | v2 > 110, ][, .(duration = max(timestamp) - min(timestamp) ), by = .(group_id)]$duration 
  )
)

which gives the total of all 'groups'.

#27 mins

update based on question in comment

To include start and end of each period, use:

DT[ v1 > 110 | v2 > 110, ][, .(start = min(timestamp),
                               end = max(timestamp),
                               duration = max(timestamp) - min(timestamp) ), 
                           by = .(group_id)][,group_id := NULL]

#                  start                 end duration
# 1: 2019-05-14 04:28:00 2019-05-14 04:30:00   2 mins
# 2: 2019-05-14 05:32:00 2019-05-14 05:57:00  25 mins

getting total duration based on continuous occurrence of a condition in R dataframe

Answers (1)

update based on question in comment

Related Questions