Reputation: 165
How can I get the index of the sample whose previous samples were consecutive and were greater than a fixed threshold in groups?
In the below example, I need to find the time
when I have consecutively 3 samples
whose speed is greater than 35 speed >= 35
group-wise
speed_threshold = 35
Group Time Speed
1 5 25
1 10 23
1 15 21
1 20 40 # Speed > 35
1 25 42 # Speed > 35
1 30 52 # Speed > 35
1 35 48 # <--- Return time = 35 as answer for Group 1 !
1 40 45
2 5 22
2 10 36 # Speed > 35
2 15 38 # Speed > 35
2 20 46 # Speed > 35
2 25 53 # <--- Return time = 25 as answer for Group 2 !
3 5 45
3 10 58 # <--- Return time = NA as answer for group 3 !
Upvotes: 0
Views: 922
Reputation: 269431
1) Using DF defined reproducibly in the Note at the end, define a function ok
which takes a vector of logicals indicating whether speed is greater than 35 and returns a logical vector of the same length which is TRUE for the first speed that comes after 3 consecutive TRUEs. Apply that to each group using ave
and subset DF
down those rows which are TRUE giving s
.
If just returning the groups which satisfy the condition is sufficient then we are done; otherwise, define Groups
as a one column data frame with one row per Group and merge that with s
so that we get an NA for those groups not satisfying the condition.
library(zoo)
ok <- function(x) cumsum(rollapplyr(x, list(-(1:3)), all, fill = FALSE)) == 1
s <- subset(DF, ave(Speed > 35, Group, FUN = ok))
Groups <- data.frame(Group = unique(DF$Group))
merge(Groups, s, all.x = TRUE)[1:2]
## Group Time
## 1 1 35
## 2 2 25
## 3 3 NA
2) A second approach is to split DF
by group and then perform the calculation over each component of the split.
library(zoo)
calc <- function(x) {
r <- rollapplyr(x$Speed > 35, list(-(1:3)), all, fill = FALSE)
c(which(cumsum(r) == 1), NA)[1]
}
sapply(split(DF, DF$Group), calc)
## 1 2 3
## 35 25 NA
Lines <- "Group Time Speed
1 5 25
1 10 23
1 15 21
1 20 40 # Speed > 35
1 25 42 # Speed > 35
1 30 52 # Speed > 35
1 35 48 # <--- Return time = 35 as answer for Group 1 !
1 40 45
2 5 22
2 10 36 # Speed > 35
2 15 38 # Speed > 35
2 20 46 # Speed > 35
2 25 53 # <--- Return time = 25 as answer for Group 2 !
3 5 45
3 10 58 # <--- Return time = NA as answer for group 3 !"
DF <- read.table(text = Lines, header = TRUE)
Upvotes: 1
Reputation: 28675
If it's above the threshold and it's the third such value in a row, capture the index in ends
. Select the first index in ends
and add one to get the index of the return time. (There may be more than 1 such group of 3 and therefore more than one element of ends
. In this case, the first end needs to be used.)
Note: In your example, the speed at return time is always above the threshold. This code does not check that as a condition at all, but simply gives the first time
after three rows with speeds above threshold (regardless of whether the speed at that time is still above the threshold).
library(data.table)
setDT(df)
speed_thresh <- 35
df[, {above <- Speed > speed_thresh
ends <- which(above & rowid(rleid(above)) == 3)
.(Return_Time = Time[ends[1] + 1])}
, Group]
# Group Return_Time
# 1: 1 35
# 2: 2 25
# 3: 3 NA
Data used:
df <- fread('
Group Time Speed
1 5 25
1 10 23
1 15 21
1 20 40
1 25 42
1 30 52
1 35 48
1 40 45
2 5 22
2 10 36
2 15 38
2 20 46
2 25 53
3 5 45
3 10 58
')
Upvotes: 3
Reputation: 886938
One option is to use rleid
to create a grouping variable based on the logic in 'Speed' and filter
the rows where the number of rows (n()
) is equal to 3 and all
'Speed' is greater than 35
library(dplyr)
library(data.table)
df1 %>%
group_by(Group, grp = rleid(Speed > speed_threshold)) %>%
filter(n() >= 3, all(Speed > speed_threshold)) %>%
slice(1:3)
Upvotes: 2