Rapha
Rapha

Reputation: 3

How to avoid this for loop in r

I am trying to obtain the max values in column DT$pna between the peak and trough events that are found in their respective columns in a data.table (i.e. DT$peak, DT$through). The DT$peaks and DT$troughs have the string "peak" and "trough" to mark the beginning and end of subsequent events. This for loop work with very reduced sample but because the data.table has millions of rows it takes for ever. Is there a better solution (possibly using data table) that would be more efficient to get the max value under this condition?

for (i in 1:nrow(DT)) {
  if(is.na(DT$peak[i])) {
    next
  }
  if(DT$peak[i] == "peak") {
    e <- i + 15000
    for (j in i:e) {
      if(is.na(DT$trough[j])) {
        next
      }
      if(DT$trough[j] == "trough") {
        x <- (DT$pna[i:j])
      }
    }  
  }
  DT[i, max_insp := max(x)]
}

Upvotes: 0

Views: 95

Answers (1)

chinsoon12
chinsoon12

Reputation: 25225

Here is an option:

DT[, rn := .I]

#use rolling join to find the nearest trough
DT[!is.na(peak), nt := DT[!is.na(trough)][.SD, on=.(rn), roll=-Inf, x.rn]]

#use non-equi join to find the max
DT[!is.na(peak), max_insp :=
    DT[.SD, on=.(rn>=rn, rn<=nt), by=.EACHI, max(x.pna)]$V1
]

Another option (might be faster if you have a lot of peaks and troughs but maybe less readable):

DT[, c("pix", "tix") := .(nafill(replace(.I, is.na(peak), NA_integer_), "locf"), 
  nafill(replace(.I, is.na(trough), NA_integer_), "nocb"))]

iv <- DT[order(pix, tix, -pna)][{
    ri <- rleid(pix, tix)
    ri!=shift(ri, fill=0L) & !is.na(pix) & !is.na(tix)
  }]

DT[iv$pix, max_insp := iv$pna]

output:

    peak trough          pna rn nt max_insp
 1: <NA>   <NA>  1.262954285  1 NA       NA
 2: peak   <NA> -0.326233361  2 11 2.404653
 3: <NA>   <NA>  1.329799263  3 NA       NA
 4: <NA>   <NA>  1.272429321  4 NA       NA
 5: <NA>   <NA>  0.414641434  5 NA       NA
 6: <NA>   <NA> -1.539950042  6 NA       NA
 7: <NA>   <NA> -0.928567035  7 NA       NA
 8: <NA>   <NA> -0.294720447  8 NA       NA
 9: <NA>   <NA> -0.005767173  9 NA       NA
10: <NA>   <NA>  2.404653389 10 NA       NA
11: <NA> trough  0.763593461 11 NA       NA
12: <NA>   <NA> -0.799009249 12 NA       NA

data:

library(data.table)
set.seed(0L)
DT <- data.table(peak=c(NA, "peak", rep(NA, 10)), 
    trough=c(rep(NA, 10), "trough", NA),
    pna=rnorm(12))

Upvotes: 1

Related Questions