Reputation: 39
The following problem:
I have a measurement series with temperatures (see temp.tbl
).
temp.tbl <- data.frame(temp = c(13, 14, 13, 11, 10, 12,
16, 18, 16, 13, 10, 11,
12, 14, 12, 10, 8, 7, 5)
)
Each time the temperature in the temp
column falls below a certain threshold (here: 12), I want to check how many subsequent temperature values remain within a temperature range (here: 9 - 15):
temp
column that falls below the threshold (12) and all subsequent values (within the range 9 - 15) will receive the value 1 in a new column (grp
).grp
column. In other words, these values are "combined" and assigned to group 1.grp
column. These values are assigned to group 2.grp
column.The solution would look something like this (see temp_solution.tbl
):
temp_solution.tbl <- data.frame(temp = c(13, 14, 13, 11, 10, 12, 16, 18, 16, 13, 10, 11, 12, 14, 12, 10, 8, 7, 5),
id = c(0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 0, 0),
onOff = c("off", "off", "off", "on", "on", "on", "on", "off", "off", "off", "on",
"on", "on", "on", "on", "on", "on", "off", "off"),
startEnd = c("off", "off", "off", "start", "on", "on", "end", "off", "off", "off", "start",
"on", "on", "on", "on", "on", "end", "off", "off")
)
temp_solution.tbl
temp id onOff startEnd
1 13 0 off off
2 14 0 off off
3 13 0 off off
4 11 1 on start # temp below threshold
5 10 1 on on # temp within limits
6 12 1 on on # temp within limits
7 16 1 on end # first temp outside limits
8 18 0 off off
9 16 0 off off
10 13 0 off off
11 10 2 on start # temp below threshold
12 11 2 on on # temp within limits
13 12 2 on on
14 14 2 on on
15 12 2 on on
16 10 2 on on
17 8 2 on end # first temp outside limits
18 7 0 off off
19 5 0 off off
temp
of rows 1 - 3 are all above the threshold (12) and get the value 0 in column grp
.grp
.grp
column.grp
= 0.grp
= 2.grp
= 2.grp
= 2.grp
= 0The columns onOff
and startEnd
are only for illustration.
The starting point can be determined e.g. with: temp < 12 & lag(temp > 12)
.
The range again e.g. with: between(temp, 9, 15)
.
temp.tbl %>%
mutate(
start = temp < 12 & lag(temp > 12),
range = between(temp, 9, 15)
)
But if a start point was "triggered" I have problems to connect this start point with the following range condition to get the values to a coherent "measurement series" (switch is ON
).
In other words, if the value in the start
column changes from FALSE
to TRUE
, the switch is set to ON
. As long as the subsequent values in the range
column are TRUE
, the switch remains ON
; if the value in range
changes to FALSE
, the switch flips to OF
. All subsequent ON
s form a group and are numbered in ascending order.
Upvotes: 1
Views: 231
Reputation: 67818
Here I use data.table
functions, but it should be straightforward to translate it to base
or dplyr
.
Check if the current value is below threshold (temp < 12
) and (&
) the preceeding value is above threshold (shift(temp) > 12
). Create a grouping variable ('id') by using cumsum
on the logical result.
Check if values are between
the lower
and upper
limits ('rng').
Within each group (by = id
):
To include also the first value that is outside the limits, check if current 'rng' is FALSE
(!rng
) and the preceeding value is TRUE
(shift(rng)
).
To handles cases where values have been outside the limits and then return back within limits (e.g. row 10), use cummin
on the logical 'rng'. Thus, once a value has become FALSE
, it stays FALSE
. Coerce result to logical.
Check if either (|
) of the two tests is TRUE
and multiply with 'id'. The FALSE
rows will become zero.
library(data.table)
d = as.data.table(tmp.tbl)
d[ , id := cumsum(temp < 12 & shift(temp) > 12)]
d[ , rng := between(temp, lower = 9, upper = 15)]
d[ , id := id * ((!rng & shift(rng)) | as.logical(cummin(rng))), by = id]
temp id rng
<num> <int> <lgcl>
1: 13 0 TRUE
2: 14 0 TRUE
3: 13 0 TRUE
4: 11 1 TRUE
5: 10 1 TRUE
6: 12 1 TRUE
7: 16 1 FALSE
8: 18 0 FALSE
9: 16 0 FALSE
10: 13 0 TRUE
11: 10 2 TRUE
12: 11 2 TRUE
13: 12 2 TRUE
14: 14 2 TRUE
15: 12 2 TRUE
16: 10 2 TRUE
17: 8 2 FALSE
18: 7 0 FALSE
19: 5 0 FALSE
Upvotes: 0