colton
colton

Reputation: 411

R Tidy : Dynamic Sequential Threshold

I'm trying to find a tidy way to dynamically adjust a threshold as I "move" through a tibble using library(tidyverse). For example, imagine a tibble containing sequential observations:

example <- 
  tibble(observed = c(2,1,1,2,2,4,10,4,2,2,3))
example
# A tibble: 11 x 1
   observed
      <dbl>
 1        2
 2        1
 3        1
 4        2
 5        2
 6        4
 7       10
 8        4
 9        2
10        2
11        3

I'm trying to calculate a threshold that starts with the initial value (2) and increments by a prespecified amount (in this case, 1) unless the current observation is greater than that threshold in which case the current observation becomes the reference threshold and further thresholds increment from it. Here is what the final tibble would look like:

answer <- 
  example %>%
  mutate(threshold = c(2,3,4,5,6,7,10,11,12,13,14))
answer
# A tibble: 11 x 2
   observed threshold
      <dbl>     <dbl>
 1        2         2
 2        1         3
 3        1         4
 4        2         5
 5        2         6
 6        4         7
 7       10        10
 8        4        11
 9        2        12
10        2        13
11        3        14

I'm looking for the best way to do this using dplyr/tidy. All help is appreciated!

EDIT:

The answers so far are very close, but miss in the case that the observed values drop and increase again. For example consider the same tibble as example above, but with a 4 instead of a 3 for the final observation:

example <- 
  tibble(observed = c(2,1,1,2,2,4,10,4,2,2,4))
example
# A tibble: 11 x 1
   observed
      <dbl>
 1        2
 2        1
 3        1
 4        2
 5        2
 6        4
 7       10
 8        4
 9        2
10        2
11        4

The diff & cumsum method then gives:

example %>%
  group_by(gr = cumsum(c(TRUE, diff(observed) > thresh))) %>%
  mutate(thresold = first(observed) + row_number() - 1) %>%
  ungroup %>%
  select(-gr)

A tibble: 11 x 2
   observed thresold
      <dbl>    <dbl>
 1        2        2
 2        1        3
 3        1        4
 4        2        5
 5        2        6
 6        4        4
 7       10       10
 8        4       11
 9        2       12
10        2       13
11        4        4

Where the final threshold value is incorrect.

Upvotes: 0

Views: 122

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 389355

You could use diff to create groups and add row number in the group to the first value.

library(dplyr)
thresh <- 1

example %>%
   group_by(gr = cumsum(c(TRUE, diff(observed) > thresh))) %>%
   mutate(thresold = first(observed) + row_number() - 1) %>%
   ungroup %>%
   select(-gr)

# A tibble: 11 x 2
#   observed thresold
#      <dbl>    <dbl>
# 1        2        2
# 2        1        3
# 3        1        4
# 4        2        5
# 5        2        6
# 6        4        4
# 7       10       10
# 8        4       11
# 9        2       12
#10        2       13
#11        3       14

To understand how the groups are created here are the detailed steps :

We first calculate the difference between consecutive values

diff(example$observed) 
#[1] -1  0  1  0  2  6 -6 -2  0  1

Note that diff gives output of length 1 less than the actual length.

We compare it with thresh which gives TRUE for every time we have value greater than the threshold

diff(example$observed) > thresh
 #[1] FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE

Now since output of diff has one value less we add one value as TRUE

c(TRUE, diff(example$observed) > thresh)
# [1]  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE

and then finally take cumsum to create groups which is used in group_by.

cumsum(c(TRUE, diff(example$observed) > thresh))
# [1] 1 1 1 1 1 2 3 3 3 3 3

EDIT

For the updated question we can add another condition to check of the previous value is greater than the current count and update the values accordingly.

example %>%
  group_by(gr = cumsum(c(TRUE, diff(observed) > thresh) & 
                observed > first(observed) + row_number())) %>%
  mutate(thresold = first(observed) + row_number() - 1) %>%
  ungroup() %>%
  select(-gr)

# A tibble: 11 x 2
#   observed thresold
#      <dbl>    <dbl>
# 1        2        2
# 2        1        3
# 3        1        4
# 4        2        5
# 5        2        6
# 6        4        7
# 7       10       10
# 8        4       11
# 9        2       12
#10        2       13
#11        4       14

Upvotes: 1

colton
colton

Reputation: 411

I think I've figured out a way to do this, by utilizing zoo::locf (although I'm not sure this part is really necessary).

First create the harder of the two examples I've listed in my description:

example2 <- 
  tibble(observed = c(2,1,1,2,2,4,10,4,2,2,4))

example2 %>%
  mutate(def = first(observed) + row_number() - 1) %>%
  mutate(t1 = pmax(observed,def)) %>%
  mutate(local_maxima = ifelse(observed == t1,t1,NA)) %>%
  mutate(groupings = zoo::na.locf(local_maxima)) %>%
  group_by(groupings) %>%
  mutate(threshold = groupings + row_number() - 1) %>%
  ungroup() %>%
  select(-def,-t1,-local_maxima,-groupings)

Result:

# A tibble: 11 x 2
   observed threshold
      <dbl>     <dbl>
 1        2         2
 2        1         3
 3        1         4
 4        2         5
 5        2         6
 6        4         7
 7       10        10
 8        4        11
 9        2        12
10        2        13
11        4        14

I'd definitely prefer a more elegant solution if anyone finds one.

Upvotes: 0

akrun
akrun

Reputation: 887991

We can create the grouping variable with lag of the column difference

library(dplyr)
thresh <- 1
example %>%
   group_by(grp = cumsum((observed - lag(observed, default = first(observed)) >
             thresh))) %>%
   mutate(threshold = observed[1] + row_number()  - 1) %>%
   ungroup %>%
          mutate(new = row_number() + 1, 
    threshold = pmax(threshold, new)) %>%     
   select(-grp, -new)
# A tibble: 11 x 2
#   observed threshold
#      <dbl>     <dbl>
# 1        2         2
# 2        1         3
# 3        1         4
# 4        2         5
# 5        2         6
# 6        4         7
# 7       10        10
# 8        4        11
# 9        2        12
#10        2        13
#11        3        14

Upvotes: 1

Related Questions