Reputation: 5
I have a dataframe like this. I want to find the duration for whcih either v1 or v2 is above 110.
timestamp v1 v2
14-05-2019 04:28 112.2 111.0
14-05-2019 04:30 112.2 110.9
14-05-2019 04:39 101.4 101.8
14-05-2019 04:40 108.0 108.8
14-05-2019 04:45 101.1 101.5
14-05-2019 04:46 100.8 101.2
14-05-2019 05:32 111.6 111.5
14-05-2019 05:36 111.5 111.5
14-05-2019 05:39 111.5 111.5
14-05-2019 05:41 111.5 111.5
14-05-2019 05:46 111.5 111.4
14-05-2019 05:46 111.5 111.3
14-05-2019 05:47 111.5 111.3
14-05-2019 05:51 111.2 111.2
14-05-2019 05:56 111.2 111.2
14-05-2019 05:57 111.2 111.2
Mycode :
str = 0
end = 0
dur = 0
diff = 0
for (i in (1:norws(x))) {
if((x['v1'][i,] >=110) || (x['v2'][i,] >=110)){
if((str !=0) && (i-str == 1)){
str = i}else{
str = i
end = i - 1
}}
if((str<end) && (end != 0)){
diff = as.numeric(x[end,1] - x[str,1],units="mins")
dur = dur + diff
}}
print(dur)
I want to get the duration of continous data where either v1 or v2 is greater than 110. Also if both start and end are same,them that is not considered or the difference is 0 sec. I will get two sets here :
14-05-2019 04:28 112.2 111.0
14-05-2019 04:30 112.2 110.9
here the time is 2 minute 14-05-2019 04:30 - 14-05-2019 04:28 Similarly,
14-05-2019 05:32 111.6 111.5
14-05-2019 05:36 111.5 111.5
14-05-2019 05:39 111.5 111.5
14-05-2019 05:41 111.5 111.5
14-05-2019 05:46 111.5 111.4
14-05-2019 05:46 111.5 111.3
14-05-2019 05:47 111.5 111.3
14-05-2019 05:51 111.2 111.2
14-05-2019 05:56 111.2 111.2
14-05-2019 05:57 111.2 111.2
here the time is 25 minute. ie, 14-05-2019 05:57 - 14-05-2019 05:32 So, totally I get : 27 minute
Upvotes: 0
Views: 228
Reputation: 27732
Here is a data.table
approach of your question.
It uses data.table::rleid()
to create groups based on the condition v1 or v2 > 110
. It then summarises on thw rows where this condition is valid, by subtracting the first timestamp of each group from the last timestamp of each group. This results in a column duration
of difftimes by group.
Alternatively, you can calculate the entire sum of all duration. format()
is used to output the answer as a string, in stead of a difftime.
sample data
library(data.table)
DT <- fread("timestamp v1 v2
14-05-2019T04:28 112.2 111.0
14-05-2019T04:30 112.2 110.9
14-05-2019T04:39 101.4 101.8
14-05-2019T04:40 108.0 108.8
14-05-2019T04:45 101.1 101.5
14-05-2019T04:46 100.8 101.2
14-05-2019T05:32 111.6 111.5
14-05-2019T05:36 111.5 111.5
14-05-2019T05:39 111.5 111.5
14-05-2019T05:41 111.5 111.5
14-05-2019T05:46 111.5 111.4
14-05-2019T05:46 111.5 111.3
14-05-2019T05:47 111.5 111.3
14-05-2019T05:51 111.2 111.2
14-05-2019T05:56 111.2 111.2
14-05-2019T05:57 111.2 111.2")
#create timestamps
DT[, timestamp := as.POSIXct( timestamp, format = "%d-%m-%YT%H:%M" )]
code
#create groups based on v1|v2 > 110
DT[, group_id := rleid( v1 > 110 | v2 > 110 ) ][]
#group by group_id, only on rows where v1 or v2 > 110
DT[ v1 > 110 | v2 > 110, ][, .(duration = max(timestamp) - min(timestamp) ), by = .(group_id)]
output
# group_id duration
# 1: 1 2 mins
# 2: 3 25 mins
alternative summary
replace last line of code with:
format(
sum(
DT[ v1 > 110 | v2 > 110, ][, .(duration = max(timestamp) - min(timestamp) ), by = .(group_id)]$duration
)
)
which gives the total of all 'groups'.
#27 mins
To include start and end of each period, use:
DT[ v1 > 110 | v2 > 110, ][, .(start = min(timestamp),
end = max(timestamp),
duration = max(timestamp) - min(timestamp) ),
by = .(group_id)][,group_id := NULL]
# start end duration
# 1: 2019-05-14 04:28:00 2019-05-14 04:30:00 2 mins
# 2: 2019-05-14 05:32:00 2019-05-14 05:57:00 25 mins
Upvotes: 1