Reputation: 23
I would like to find the next occurrence ('Time') where minimum value ('MinValue') is breached.
For example, given the following sample of data:
Time <- c(1:10)
Value <- c(9,7,10,6,7,7,10,9,6,10)
MinValue <- c(5,6,8,4,6,6,8,6,4,9)
SampleData <- data.frame(Time, Value, MinValue)
I would like to achieve the following:
I would like to add column ('TimeMinValue') which show at what time the minimum value of the current row is breached.
Thank you all very much for your help! I have tried to look for my answer online but couldn't find anything. However, if I have accidentally duplicated a question then I apologise.
Upvotes: 0
Views: 121
Reputation: 42574
Alternatively, this can be solved by a self-join.
Here are two different implementations using
data.table
SQL
/ sqldf()
data.table
The data.table
version is a "one-liner" which aggregates in a non-equi self join:
library(data.table)
setDT(SampleData)[
, TimeMinValue := .SD[.SD, on = .(Time > Time, Value <= MinValue), .(min(x.Time)), by = .EACHI]$V1][]
Time Value MinValue TimeMinValue 1: 1 9 5 NA 2: 2 7 6 4 3: 3 10 8 4 4: 4 6 4 NA 5: 5 7 6 9 6: 6 7 6 9 7: 7 10 8 9 8: 8 9 6 9 9: 9 6 4 NA 10: 10 10 9 NA
SQL
/ sqldf()
I was curious how this can be expressed in SQL syntax. The code is more verbose because we have to join twice:
MinValue
with Value
in subsequent rowsTimeMinValue
for each Time
library(sqldf)
sqldf(
"SELECT sd1.*, min(t) TimeMinValue
FROM SampleData sd1
LEFT JOIN ( SELECT sd1.Time,
sd2.Time t
FROM SampleData sd1
JOIN SampleData sd2
WHERE sd1.Time < sd2.Time AND
sd1.MinValue >= sd2.Value
)
USING (Time)
GROUP BY Time
")
Time Value MinValue TimeMinValue 1 1 9 5 NA 2 2 7 6 4 3 3 10 8 4 4 4 6 4 NA 5 5 7 6 9 6 6 7 6 9 7 7 10 8 9 8 8 9 6 9 9 9 6 4 NA 10 10 10 9 NA
SQL
versionG. Grothendiek has suggested a streamlined SQL version which is much closer to the data.table
version:
sqldf(
"SELECT a.*, min(b.Time) TimeMinValue
FROM SampleData a
LEFT JOIN SampleData b
ON a.Time < b.Time AND
a.MinValue >= b.Value
GROUP BY a.Time")
Upvotes: 1
Reputation: 14774
Surely there's a less convoluted approach, but one option would be sapply
over rows with which
:
transform(
SampleData,
TimeMinValue = sapply(
seq_len(nrow(SampleData)),
function(x)
Time[
which(Value <= MinValue[x])[which(Value <= MinValue[x]) > x][1]
]))
Output:
Time Value MinValue TimeMinValue
1 1 9 5 NA
2 2 7 6 4
3 3 10 8 4
4 4 6 4 NA
5 5 7 6 9
6 6 7 6 9
7 7 10 8 9
8 8 9 6 9
9 9 6 4 NA
10 10 10 9 NA
Upvotes: 1