Brent Pease
Brent Pease

Reputation: 62

Find minimum value of conditional statement where condition is also met for all values after

I would like to identify the first value which is less than 1 and where all values following this element are also less than 1 and less than or equal to that value.

I have a DT:

stack <- data.table(a = as.numeric(seq(1,10,1)),
                b = as.numeric(c(1.54, 1.17, 0.75, 1.65, 0.61, 0.31, 0.90, 0.07, 0.04, 0.01)),
               ID = as.numeric(rep(seq(1,2,1),5)))

stack
     a    b ID
 1:  1 1.54  1
 2:  2 1.17  2
 3:  3 0.75  1
 4:  4 1.65  2
 5:  5 0.61  1
 6:  6 0.31  2
 7:  7 0.90  1
 8:  8 0.07  2
 9:  9 0.04  1
10: 10 0.01  2

The value I am looking for in this example would be row 7:

   a    b ID
7: 7 0.90  1

This is the first value less than 1 where all values following are less than 1 and are also less than or equal to that value. I am specifically interested in returning the value from column a.

I have tried stack[,min(which(b < 1))] but this is clearly missing the additional conditional requirements

Upvotes: 2

Views: 160

Answers (2)

IceCreamToucan
IceCreamToucan

Reputation: 28675

stack[which(b < 1 &
            sapply(seq_len(.N), 
                   function(i) all(b[min(.N, i + 1):nrow(stack)] <= b[i]))
            )[1]]

If b[i] < 1 and b[i + x] <= b[i] we don't need to check if b[i + x] < 1

Or by ID

fun <- function(b){
  N <- length(b)
  which(b < 1 &
        sapply(seq_len(N), 
               function(i) all(b[min(N, i + 1):N] <= b[i]))
        )[1] == seq_len(N)
}

setorder(stack, ID)
stack[stack[, fun(b), by = ID]$V1]

EDIT:

I cannot delete this post since it has been accepted, but I have realized this gives an incorrect answer in many cases, e.g. the one below. The other answer is correct (and much faster anyway).

set.seed(0)
DT <- data.table(a=1:10, b=1.1*runif(10))

Upvotes: 0

chinsoon12
chinsoon12

Reputation: 25225

Another method:

library(data.table)

set.seed(0L)
M <- 1e4
DT <- data.table(a=1:M, b=10*runif(M))

mtd1 <- function() {
    DT[which(b < 1 &
            sapply(seq_len(.N), 
                function(i) all(b[min(.N, i + 1):nrow(DT)] <= b[i]))
    )[1]]   
}

mtd2 <- function() {
    DT[order(-b), .SD[b < 1][1L]]
}

identical(mtd1(), mtd2())
#[1] TRUE

library(microbenchmark)
microbenchmark(mtd1(), mtd2(), times=3L)

timings:

Unit: milliseconds
   expr      min       lq       mean   median        uq      max neval
 mtd1() 737.5113 754.3420 766.047900 771.1728 780.31620 789.4596     3
 mtd2()   1.6830   1.7687   3.118033   1.8544   3.83555   5.8167     3

Upvotes: 2

Related Questions