Reputation: 3
I am trying to obtain the max values in column DT$pna between the peak and trough events that are found in their respective columns in a data.table (i.e. DT$peak, DT$through). The DT$peaks and DT$troughs have the string "peak" and "trough" to mark the beginning and end of subsequent events. This for loop work with very reduced sample but because the data.table has millions of rows it takes for ever. Is there a better solution (possibly using data table) that would be more efficient to get the max value under this condition?
for (i in 1:nrow(DT)) {
if(is.na(DT$peak[i])) {
next
}
if(DT$peak[i] == "peak") {
e <- i + 15000
for (j in i:e) {
if(is.na(DT$trough[j])) {
next
}
if(DT$trough[j] == "trough") {
x <- (DT$pna[i:j])
}
}
}
DT[i, max_insp := max(x)]
}
Upvotes: 0
Views: 95
Reputation: 25225
Here is an option:
DT[, rn := .I]
#use rolling join to find the nearest trough
DT[!is.na(peak), nt := DT[!is.na(trough)][.SD, on=.(rn), roll=-Inf, x.rn]]
#use non-equi join to find the max
DT[!is.na(peak), max_insp :=
DT[.SD, on=.(rn>=rn, rn<=nt), by=.EACHI, max(x.pna)]$V1
]
Another option (might be faster if you have a lot of peaks and troughs but maybe less readable):
DT[, c("pix", "tix") := .(nafill(replace(.I, is.na(peak), NA_integer_), "locf"),
nafill(replace(.I, is.na(trough), NA_integer_), "nocb"))]
iv <- DT[order(pix, tix, -pna)][{
ri <- rleid(pix, tix)
ri!=shift(ri, fill=0L) & !is.na(pix) & !is.na(tix)
}]
DT[iv$pix, max_insp := iv$pna]
output:
peak trough pna rn nt max_insp
1: <NA> <NA> 1.262954285 1 NA NA
2: peak <NA> -0.326233361 2 11 2.404653
3: <NA> <NA> 1.329799263 3 NA NA
4: <NA> <NA> 1.272429321 4 NA NA
5: <NA> <NA> 0.414641434 5 NA NA
6: <NA> <NA> -1.539950042 6 NA NA
7: <NA> <NA> -0.928567035 7 NA NA
8: <NA> <NA> -0.294720447 8 NA NA
9: <NA> <NA> -0.005767173 9 NA NA
10: <NA> <NA> 2.404653389 10 NA NA
11: <NA> trough 0.763593461 11 NA NA
12: <NA> <NA> -0.799009249 12 NA NA
data:
library(data.table)
set.seed(0L)
DT <- data.table(peak=c(NA, "peak", rep(NA, 10)),
trough=c(rep(NA, 10), "trough", NA),
pna=rnorm(12))
Upvotes: 1