Reputation: 353
I have data like this:
library(data.table)
dt1 <- data.table(
id = 1,
week = c( seq(1:260))
)
dt1[0:100, status := "A"][101:260, status := "B"]
dt2 <- data.table(
id = 2,
week = c( seq(1:260))
)
dt2[0:34, status := "A"][35:70, status := "B"][71:260, status := "A"]
dt3 <- data.table(
id = 3,
week = c( seq(1:260))
)
dt3[0:80, status := "A"][81:90, status := "B"][91:100, status := "A"][101:260, status := "B"]
data <- rbind(dt1,dt2,dt3)
I would like to find, for each unique id, the index before which a 52 week period has passed with at least 75% (39 weeks) with status=="B". I would want the following end product:
data_want <- data.table(
id = c(1,3),
week = c(153, 133)
)
For id 1, 52 weeks after 101 (the first row with status==B) is 153 and at least 75% of that period, status==B. And so on.
Any suggestions?
Upvotes: 0
Views: 592
Reputation: 160407
If I'm understanding what you need (and disregard your data_want
), then I think this works:
data[, fr := frollmean(status == "B", 52, align = "right", fill = 0), by = .(id)]
data[, .SD[shift(fr >= 0.75, type = "lead"),][1,], by = id]
# id week status fr
# <num> <int> <char> <num>
# 1: 1 138 B 0.7307692
# 2: 2 NA <NA> NA
# 3: 3 128 B 0.7307692
Verification:
data[id == 1 & between(week, 130, 140),]
# id week status fr
# <num> <int> <char> <num>
# 1: 1 130 B 0.5769231
# 2: 1 131 B 0.5961538
# 3: 1 132 B 0.6153846
# 4: 1 133 B 0.6346154
# 5: 1 134 B 0.6538462
# 6: 1 135 B 0.6730769
# 7: 1 136 B 0.6923077
# 8: 1 137 B 0.7115385
# 9: 1 138 B 0.7307692
# 10: 1 139 B 0.7500000 <-- 52-week window had 75% "B"
# 11: 1 140 B 0.7692308
Upvotes: 6