ry13
ry13

Reputation: 23

How to find the time where certain value is breached in R?

I would like to find the next occurrence ('Time') where minimum value ('MinValue') is breached.

For example, given the following sample of data:

Time <- c(1:10)
Value <- c(9,7,10,6,7,7,10,9,6,10)
MinValue <- c(5,6,8,4,6,6,8,6,4,9)
SampleData <- data.frame(Time, Value, MinValue)

I would like to achieve the following:

enter image description here

I would like to add column ('TimeMinValue') which show at what time the minimum value of the current row is breached.

Thank you all very much for your help! I have tried to look for my answer online but couldn't find anything. However, if I have accidentally duplicated a question then I apologise.

Upvotes: 0

Views: 121

Answers (2)

Uwe
Uwe

Reputation: 42574

Alternatively, this can be solved by a self-join.

Here are two different implementations using

  1. data.table
  2. SQL / sqldf()

data.table

The data.table version is a "one-liner" which aggregates in a non-equi self join:

library(data.table)
setDT(SampleData)[
  , TimeMinValue := .SD[.SD, on = .(Time > Time, Value <= MinValue), .(min(x.Time)), by = .EACHI]$V1][]
    Time Value MinValue TimeMinValue
 1:    1     9        5           NA
 2:    2     7        6            4
 3:    3    10        8            4
 4:    4     6        4           NA
 5:    5     7        6            9
 6:    6     7        6            9
 7:    7    10        8            9
 8:    8     9        6            9
 9:    9     6        4           NA
10:   10    10        9           NA

SQL / sqldf()

I was curious how this can be expressed in SQL syntax. The code is more verbose because we have to join twice:

  1. a self-join which compares MinValue with Value in subsequent rows
  2. a left join to complete rows and to find the TimeMinValue for each Time
library(sqldf)
sqldf(
  "SELECT sd1.*, min(t) TimeMinValue
   FROM SampleData sd1 
   LEFT JOIN ( SELECT sd1.Time, 
                      sd2.Time t 
          FROM SampleData sd1 
          JOIN SampleData sd2
          WHERE sd1.Time < sd2.Time AND
                sd1.MinValue >= sd2.Value
        )
   USING (Time)
   GROUP BY Time  
")
   Time Value MinValue TimeMinValue
1     1     9        5           NA
2     2     7        6            4
3     3    10        8            4
4     4     6        4           NA
5     5     7        6            9
6     6     7        6            9
7     7    10        8            9
8     8     9        6            9
9     9     6        4           NA
10   10    10        9           NA

EDIT: Improved SQL version

G. Grothendiek has suggested a streamlined SQL version which is much closer to the data.table version:

sqldf(
  "SELECT a.*, min(b.Time) TimeMinValue
   FROM SampleData a
   LEFT JOIN SampleData b
   ON a.Time < b.Time AND
      a.MinValue >= b.Value
   GROUP BY a.Time")

Upvotes: 1

arg0naut91
arg0naut91

Reputation: 14774

Surely there's a less convoluted approach, but one option would be sapply over rows with which:

transform(
  SampleData,
  TimeMinValue = sapply(
    seq_len(nrow(SampleData)),
    function(x)
      Time[
        which(Value <= MinValue[x])[which(Value <= MinValue[x]) > x][1]
        ]))

Output:

   Time Value MinValue TimeMinValue
1     1     9        5           NA
2     2     7        6            4
3     3    10        8            4
4     4     6        4           NA
5     5     7        6            9
6     6     7        6            9
7     7    10        8            9
8     8     9        6            9
9     9     6        4           NA
10   10    10        9           NA

Upvotes: 1

Related Questions