find index based on condition in data.table

Question

I have data like this:

library(data.table)

dt1 <- data.table(
  id = 1,
  week = c( seq(1:260))
)
dt1[0:100, status := "A"][101:260, status := "B"]

dt2 <- data.table(
  id = 2,
  week = c( seq(1:260))
)
dt2[0:34, status := "A"][35:70, status := "B"][71:260, status := "A"]

dt3 <- data.table(
  id = 3,
  week = c( seq(1:260))
)
dt3[0:80, status := "A"][81:90, status := "B"][91:100, status := "A"][101:260, status := "B"]

data <- rbind(dt1,dt2,dt3)

I would like to find, for each unique id, the index before which a 52 week period has passed with at least 75% (39 weeks) with status=="B". I would want the following end product:

data_want <- data.table(
  id = c(1,3),
  week = c(153, 133)
)

For id 1, 52 weeks after 101 (the first row with status==B) is 153 and at least 75% of that period, status==B. And so on.

Any suggestions?

r2evans · Accepted Answer

If I'm understanding what you need (and disregard your data_want), then I think this works:

data[, fr := frollmean(status == "B", 52, align = "right", fill = 0), by = .(id)]
data[, .SD[shift(fr >= 0.75, type = "lead"),][1,], by = id]
#       id  week status        fr
#           
# 1:     1   138      B 0.7307692
# 2:     2    NA           NA
# 3:     3   128      B 0.7307692

Verification:

data[id == 1 & between(week, 130, 140),]
#        id  week status        fr
#            
#  1:     1   130      B 0.5769231
#  2:     1   131      B 0.5961538
#  3:     1   132      B 0.6153846
#  4:     1   133      B 0.6346154
#  5:     1   134      B 0.6538462
#  6:     1   135      B 0.6730769
#  7:     1   136      B 0.6923077
#  8:     1   137      B 0.7115385
#  9:     1   138      B 0.7307692
# 10:     1   139      B 0.7500000  <-- 52-week window had 75% "B"
# 11:     1   140      B 0.7692308

find index based on condition in data.table

Answers (1)

Related Questions